[
https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13566898#comment-13566898
]
Phabricator commented on HIVE-2340:
-----------------------------------
hagleitn has commented on the revision "HIVE-2340 [jira] optimize orderby
followed by a groupby".
Partial review
INLINE COMMENTS
common/src/java/org/apache/hadoop/hive/conf/HiveConf.java:521 Not sure why
this is needed or why this defaults to 4. From comment below it seems this is
just to avoid the single reducer order-by case for performance reasons, is that
correct?
ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java:787
Is this required or extra protection? Comment at the top of the file says
mapjoin optimization happens before this (and probably should for performance
reasons). Also, if I understand it correctly "joinAndSort" might be a better
name than "fixed". You're basically saying that if an optimization wants to
change the join after this they need to make sure the ordering of the keys is
preserved, right?
ql/src/java/org/apache/hadoop/hive/ql/ppd/PredicateTransitivePropagate.java:136
seems orthogonal to this patch.
ql/src/test/queries/clientpositive/reduce_deduplicate.q:7 There are not a lot
of tests, for min.reducer=1. No order by case for instance. Maybe the
reduce_deduplicate_extended.q should run with both default and min.reducer=1.
REVISION DETAIL
https://reviews.facebook.net/D1209
To: JIRA, navis
Cc: hagleitn
> optimize orderby followed by a groupby
> --------------------------------------
>
> Key: HIVE-2340
> URL: https://issues.apache.org/jira/browse/HIVE-2340
> Project: Hive
> Issue Type: Sub-task
> Components: Query Processor
> Reporter: Navis
> Assignee: Navis
> Priority: Minor
> Labels: perfomance
> Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch,
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch,
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch,
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch,
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.1.patch.txt,
> HIVE-2340.D1209.6.patch, HIVE-2340.D1209.7.patch, HIVE-2340.D1209.8.patch,
> HIVE-2340.D1209.9.patch, testclidriver.txt
>
>
> Before implementing optimizer for JOIN-GBY, try to implement RS-GBY
> optimizer(cluster-by following group-by).
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira