Github user simonellistonball commented on the issue:

    https://github.com/apache/metron/pull/1150
  
    Do we have to use groupByKey in the spark implementation, is it not 
possible to use reduceByKey to build the profiles, since profilers are by 
definition reducible. I've seen groupByKey cause performance problems (see 
https://databricks.gitbooks.io/databricks-spark-knowledge-base/content/best_practices/prefer_reducebykey_over_groupbykey.html
 for a good discussion on this).


---

Reply via email to