Sahil Takiar created HIVE-20270:
-----------------------------------
Summary: Don't serialize hashCode for groupByKey
Key: HIVE-20270
URL: https://issues.apache.org/jira/browse/HIVE-20270
Project: Hive
Issue Type: Bug
Components: Spark
Reporter: Sahil Takiar
Assignee: Sahil Takiar
Similar to HIVE-20032, but for {{groupByKey}}. The tricky part with
{{groupByKey}} is we need to preserve the {{hashCode}} until the key gets
partitioned (via the {{HashPartitioner}}) but after that we don't really need
to preserve the {{hashCode}}. The {{groupByKey}} operator in Spark does require
a {{hashCode}} since it puts everything in a map, but it can use a different
hash-code than the one specified in {{HiveKey}}. The hashcode in {{HiveKey}} is
only important for determining the partition the key should be assigned to.
The drawback is that computing the hashcode for each {{HiveKey}} might require
more CPU resources, but we should profile it just in case.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)