Github user simonellistonball commented on the issue:
https://github.com/apache/metron/pull/1150
Do we have to use groupByKey in the spark implementation, is it not
possible to use reduceByKey to build the profiles, since profilers are by
definition reducible. I've seen groupByKey cause performance problems (see
https://databricks.gitbooks.io/databricks-spark-knowledge-base/content/best_practices/prefer_reducebykey_over_groupbykey.html
for a good discussion on this).---
