[GitHub] metron issue #1150: METRON-1707 Port Profiler to Spark [Feature Branch]

simonellistonball Mon, 13 Aug 2018 07:34:03 -0700

Github user simonellistonball commented on the issue:

    https://github.com/apache/metron/pull/1150
  
    Do we have to use groupByKey in the spark implementation, is it not 
possible to use reduceByKey to build the profiles, since profilers are by 
definition reducible. I've seen groupByKey cause performance problems (see 
https://databricks.gitbooks.io/databricks-spark-knowledge-base/content/best_practices/prefer_reducebykey_over_groupbykey.html
 for a good discussion on this).

---

[GitHub] metron issue #1150: METRON-1707 Port Profiler to Spark [Feature Branch]

Reply via email to