Namit Jain created HIVE-4137: -------------------------------- Summary: optimize group by followed by joins for bucketed/sorted tables Key: HIVE-4137 URL: https://issues.apache.org/jira/browse/HIVE-4137 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Namit Jain
Consider the following scenario: create table T1 (...) clustered by (key) sorted by (key) into 2 buckets; create table T2 (...) clustered by (key) sorted by (key) into 2 buckets; create table T3 (...) clustered by (key) sorted by (key) into 2 buckets; SET hive.enforce.sorting=true; SET hive.enforce.bucketing=true; insert overwrite table T3 select .. from (select key, aggr() from T1 group by key) s1 full outer join (select key, aggr() from T2 group by key) s2 on s1.key=s2.ley; Ideally, this query can be performed in a single map-only job. Group By -> SortMerge Join. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira