[ 
https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13570645#comment-13570645
 ] 

Gunther Hagleitner commented on HIVE-2340:
------------------------------------------

[~navis]: I think in general the logic should be to copy numReducers from 
parent to child not the other way around. If hive makes a decent estimate of 
reducers for the parent, that's probably the number you want to carry into the 
combined reduce stage, because that means each reducer is doing the desired 
amount of work. Buckets and order by are the only special cases I can think of, 
where the number needs to be fixed.

For those special cases without knowing the cardinalities of join/group 
by/tables, it's indeed difficult to guess if the optimization should be on or 
off. However, what do you think of using a max ratio of parent reducers/child 
reducers instead of a fixed minimum number of reducers for the child? With a 
default of 4 maybe. I.e.: If there are less than 4 times as many reducers in 
the parent than in the child collapse (assuming another job will be more 
expensive than the lower number of reducers), else leave it alone. The 
optimization is only good if the input sizes of the child and parent reducers 
are similar and expressing this as a ratio of number of reducers is probably 
the closest we can get right now.

This would enable the optimization for a larger body of queries (small tables, 
single input split, empty group by expr, etc).
                
> optimize orderby followed by a groupby
> --------------------------------------
>
>                 Key: HIVE-2340
>                 URL: https://issues.apache.org/jira/browse/HIVE-2340
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Query Processor
>            Reporter: Navis
>            Assignee: Navis
>            Priority: Minor
>              Labels: perfomance
>         Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.1.patch.txt, 
> HIVE-2340.D1209.10.patch, HIVE-2340.D1209.6.patch, HIVE-2340.D1209.7.patch, 
> HIVE-2340.D1209.8.patch, HIVE-2340.D1209.9.patch, testclidriver.txt
>
>
> Before implementing optimizer for JOIN-GBY, try to implement RS-GBY 
> optimizer(cluster-by following group-by).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to