[jira] Created: (HIVE-609) optimize multi-group by

Namit Jain (JIRA) Mon, 06 Jul 2009 12:59:38 -0700

optimize multi-group by 
------------------------

                 Key: HIVE-609
                 URL: https://issues.apache.org/jira/browse/HIVE-609
             Project: Hadoop Hive
          Issue Type: Improvement
          Components: Query Processor
            Reporter: Namit Jain



For query like:

from src
insert overwrite table dest1 select col1, count(distinct colx) group by col1
insert overwrite table dest2 select col2, count(distinct colx) group by col2;



If map side aggregation is turned off, we currently do 4 map-reduce jobs.
The plan can be optimized by running it in 3 map-reduce jobs, by spraying over 
the
distinct column first and then aggregating individual results.

This may not be possible if there are multiple distinct columns, but the above 
query is very common
in data warehousing environments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (HIVE-609) optimize multi-group by

Reply via email to