[ https://issues.apache.org/jira/browse/METRON-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16578343#comment-16578343 ]
ASF GitHub Bot commented on METRON-1707: ---------------------------------------- Github user simonellistonball commented on the issue: https://github.com/apache/metron/pull/1150 Do we have to use groupByKey in the spark implementation, is it not possible to use reduceByKey to build the profiles, since profilers are by definition reducible. I've seen groupByKey cause performance problems (see https://databricks.gitbooks.io/databricks-spark-knowledge-base/content/best_practices/prefer_reducebykey_over_groupbykey.html for a good discussion on this). > Port Profiler to Spark > ---------------------- > > Key: METRON-1707 > URL: https://issues.apache.org/jira/browse/METRON-1707 > Project: Metron > Issue Type: Sub-task > Reporter: Nick Allen > Assignee: Nick Allen > Priority: Major > > Create a port of the Profiler that runs in Spark. -- This message was sent by Atlassian JIRA (v7.6.3#76005)