[ https://issues.apache.org/jira/browse/HIVE-609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Carl Steinbach updated HIVE-609: -------------------------------- Fix Version/s: 0.4.0 > optimize multi-group by > ------------------------ > > Key: HIVE-609 > URL: https://issues.apache.org/jira/browse/HIVE-609 > Project: Hadoop Hive > Issue Type: Improvement > Components: Query Processor > Reporter: Namit Jain > Assignee: Namit Jain > Fix For: 0.4.0 > > Attachments: hive.609.1.patch, hive.609.10.patch, hive.609.11.patch, > hive.609.2.patch, hive.609.3.patch, hive.609.4.patch, hive.609.5.patch, > hive.609.6.patch, hive.609.7.patch > > > For query like: > from src > insert overwrite table dest1 select col1, count(distinct colx) group by col1 > insert overwrite table dest2 select col2, count(distinct colx) group by col2; > If map side aggregation is turned off, we currently do 4 map-reduce jobs. > The plan can be optimized by running it in 3 map-reduce jobs, by spraying > over the > distinct column first and then aggregating individual results. > This may not be possible if there are multiple distinct columns, but the > above query is very common > in data warehousing environments. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.