Sahil Takiar created HIVE-20108:
-----------------------------------
Summary: Investigate alternatives to groupByKey
Key: HIVE-20108
URL: https://issues.apache.org/jira/browse/HIVE-20108
Project: Hive
Issue Type: Improvement
Components: Spark
Reporter: Sahil Takiar
Assignee: Sahil Takiar
We use {{groupByKey}} for aggregations (or if
{{hive.spark.use.groupby.shuffle}} is false we use
{{repartitionAndSortWithinPartitions}}).
{{groupByKey}} has its drawbacks because it can't spill records within a single
key group. It also seems to be doing some unnecessary work in Spark's
{{Aggregator}} (not positive about this part).
{{repartitionAndSortWithinPartitions}} is better, but the sorting within
partitions isn't necessary for aggregations.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)