mahesh kumar behera created HIVE-24471:
------------------------------------------
Summary: Add support for combiner in hash mode group aggregation
Key: HIVE-24471
URL: https://issues.apache.org/jira/browse/HIVE-24471
Project: Hive
Issue Type: Bug
Components: Hive
Reporter: mahesh kumar behera
Assignee: mahesh kumar behera
In map side group aggregation, partial grouped aggregation is calculated to
reduce the data written to disk by map task. In case of hash aggregation, where
the input data is not sorted, hash table is used. If the hash table size
increases beyond configurable limit, data is flushed to disk and new hash table
is generated. If the reduction by hash table is less than min hash aggregation
reduction calculated during compile time, the map side aggregation is converted
to streaming mode. So if the first few batch of records does not result into
significant reduction, then the mode is switched to streaming mode. This may
have impact on performance, if the subsequent batch of records have less number
of distinct values. To mitigate this situation, a combiner can be added to the
map task after the keys are sorted. This will make sure that the aggregation is
done if possible and reduce the data written to disk.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)