[ 
https://issues.apache.org/jira/browse/METRON-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16578343#comment-16578343
 ] 

ASF GitHub Bot commented on METRON-1707:
----------------------------------------

Github user simonellistonball commented on the issue:

    https://github.com/apache/metron/pull/1150
  
    Do we have to use groupByKey in the spark implementation, is it not 
possible to use reduceByKey to build the profiles, since profilers are by 
definition reducible. I've seen groupByKey cause performance problems (see 
https://databricks.gitbooks.io/databricks-spark-knowledge-base/content/best_practices/prefer_reducebykey_over_groupbykey.html
 for a good discussion on this).


> Port Profiler to Spark
> ----------------------
>
>                 Key: METRON-1707
>                 URL: https://issues.apache.org/jira/browse/METRON-1707
>             Project: Metron
>          Issue Type: Sub-task
>            Reporter: Nick Allen
>            Assignee: Nick Allen
>            Priority: Major
>
> Create a port of the Profiler that runs in Spark.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to