[jira] [Commented] (HIVE-4137) optimize group by followed by joins for bucketed/sorted tables

Lianhui Wang (JIRA) Thu, 07 Mar 2013 17:56:14 -0800

    [ 
https://issues.apache.org/jira/browse/HIVE-4137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13596690#comment-13596690
 ]


Lianhui Wang commented on HIVE-4137:
------------------------------------

in addition. for bucketed/sorted tables, for single group by operator,it only 
needs map-group by operator and doesnot have reduce-group by operator.
example:
select key,aggr() from T1 group by key.
now plan is
TS-SEL-GBY-RS-GBY-SEL-FS
but that can chang to following plan
TS-SEL-GBY-SEL-FS

                
> optimize group by followed by joins for bucketed/sorted tables
> --------------------------------------------------------------
>
>                 Key: HIVE-4137
>                 URL: https://issues.apache.org/jira/browse/HIVE-4137
>             Project: Hive
>          Issue Type: Improvement
>          Components: Query Processor
>            Reporter: Namit Jain
>
> Consider the following scenario:
> create table T1 (...) clustered by (key) sorted by (key) into 2 buckets;
> create table T2 (...) clustered by (key) sorted by (key) into 2 buckets;
> create table T3 (...) clustered by (key) sorted by (key) into 2 buckets;
> SET hive.enforce.sorting=true;
> SET hive.enforce.bucketing=true;
> insert overwrite table T3
> select ..
> from 
> (select key, aggr() from T1 group by key) s1
> full outer join
> (select key, aggr() from T2 group by key) s2
> on s1.key=s2.ley;
> Ideally, this query can be performed in a single map-only job.
> Group By -> SortMerge Join.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4137) optimize group by followed by joins for bucketed/sorted tables

Reply via email to