[ https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13570645#comment-13570645 ]
Gunther Hagleitner commented on HIVE-2340: ------------------------------------------ [~navis]: I think in general the logic should be to copy numReducers from parent to child not the other way around. If hive makes a decent estimate of reducers for the parent, that's probably the number you want to carry into the combined reduce stage, because that means each reducer is doing the desired amount of work. Buckets and order by are the only special cases I can think of, where the number needs to be fixed. For those special cases without knowing the cardinalities of join/group by/tables, it's indeed difficult to guess if the optimization should be on or off. However, what do you think of using a max ratio of parent reducers/child reducers instead of a fixed minimum number of reducers for the child? With a default of 4 maybe. I.e.: If there are less than 4 times as many reducers in the parent than in the child collapse (assuming another job will be more expensive than the lower number of reducers), else leave it alone. The optimization is only good if the input sizes of the child and parent reducers are similar and expressing this as a ratio of number of reducers is probably the closest we can get right now. This would enable the optimization for a larger body of queries (small tables, single input split, empty group by expr, etc). > optimize orderby followed by a groupby > -------------------------------------- > > Key: HIVE-2340 > URL: https://issues.apache.org/jira/browse/HIVE-2340 > Project: Hive > Issue Type: Sub-task > Components: Query Processor > Reporter: Navis > Assignee: Navis > Priority: Minor > Labels: perfomance > Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.1.patch.txt, > HIVE-2340.D1209.10.patch, HIVE-2340.D1209.6.patch, HIVE-2340.D1209.7.patch, > HIVE-2340.D1209.8.patch, HIVE-2340.D1209.9.patch, testclidriver.txt > > > Before implementing optimizer for JOIN-GBY, try to implement RS-GBY > optimizer(cluster-by following group-by). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira