when using map-side aggregates - perform single map-reduce group-by
-------------------------------------------------------------------

                 Key: HIVE-223
                 URL: https://issues.apache.org/jira/browse/HIVE-223
             Project: Hadoop Hive
          Issue Type: Improvement
          Components: Query Processor
            Reporter: Joydeep Sen Sarma


today even when we do map side aggregates - we do multiple map-reduce jobs. 
however - the reason for doing multiple map-reduce group-bys (for single 
group-bys) was the fear of skews. When we are doing map side aggregates - skews 
should not exist for the most part. There can be two reason for skews:
- large number of entries for a single grouping set - map side aggregates 
should take care of this
- badness in hash function that sends too much stuff to one reducer - we should 
be able to take care of this by having good hash functions (and prime number 
reducer counts)

So i think we should be able to do a single stage map-reduce when doing 
map-side aggregates.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to