when using map-side aggregates - perform single map-reduce group-by
-------------------------------------------------------------------
Key: HIVE-223
URL: https://issues.apache.org/jira/browse/HIVE-223
Project: Hadoop Hive
Issue Type: Improvement
Components: Query Processor
Reporter: Joydeep Sen Sarma
today even when we do map side aggregates - we do multiple map-reduce jobs.
however - the reason for doing multiple map-reduce group-bys (for single
group-bys) was the fear of skews. When we are doing map side aggregates - skews
should not exist for the most part. There can be two reason for skews:
- large number of entries for a single grouping set - map side aggregates
should take care of this
- badness in hash function that sends too much stuff to one reducer - we should
be able to take care of this by having good hash functions (and prime number
reducer counts)
So i think we should be able to do a single stage map-reduce when doing
map-side aggregates.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.