optimize multi-group by
------------------------
Key: HIVE-609
URL: https://issues.apache.org/jira/browse/HIVE-609
Project: Hadoop Hive
Issue Type: Improvement
Components: Query Processor
Reporter: Namit Jain
For query like:
from src
insert overwrite table dest1 select col1, count(distinct colx) group by col1
insert overwrite table dest2 select col2, count(distinct colx) group by col2;
If map side aggregation is turned off, we currently do 4 map-reduce jobs.
The plan can be optimized by running it in 3 map-reduce jobs, by spraying over
the
distinct column first and then aggregating individual results.
This may not be possible if there are multiple distinct columns, but the above
query is very common
in data warehousing environments.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.