Indhumathi27 commented on code in PR #6202:
URL: https://github.com/apache/hive/pull/6202#discussion_r2668471028
##########
ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java:
##########
@@ -1322,6 +1325,54 @@ private static void
runTopNKeyOptimization(OptimizeTezProcContext procCtx)
ogw.startWalking(topNodes, null);
}
+ /*
+ * Build the ReduceSink matching pattern used by TopNKey optimization.
+ *
+ * For ORDER BY / LIMIT queries that do not involve GROUP BY or JOIN,
+ * applying TopNKey results in a performance regression. ReduceSink
+ * operators created only for ordering must therefore be excluded from
+ * TopNKey.
+ *
+ * When ORDER BY or LIMIT is present, restrict TopNKey to ReduceSink
+ * operators that originate from GROUP BY, JOIN, MAPJOIN, LATERAL VIEW
+ * JOIN or PTF query shapes. SELECT and FILTER operators may appear in
+ * between.
+ */
+ private static String buildTopNKeyRegexPattern(OptimizeTezProcContext
procCtx) {
+ String reduceSinkOp = ReduceSinkOperator.getOperatorName() + "%";
+
+ boolean hasOrderOrLimit =
+ procCtx.parseContext.getQueryProperties().hasLimit() ||
+ procCtx.parseContext.getQueryProperties().hasOrderBy();
+
+ if (hasPTFReduceSink(procCtx) || !hasOrderOrLimit) {
+ return reduceSinkOp;
+ }
+
+ return "("
+ + GroupByOperator.getOperatorName() + "|"
+ + PTFOperator.getOperatorName() + "|"
+ + JoinOperator.getOperatorName() + "|"
+ + MapJoinOperator.getOperatorName() + "|"
+ + LateralViewJoinOperator.getOperatorName() + "|"
+ + CommonMergeJoinOperator.getOperatorName()
+ + ")"
+ + "(%("
+ + SelectOperator.getOperatorName() + "|"
+ + FilterOperator.getOperatorName()
Review Comment:
yes. There are some pattern of group by queries, which will have SELECT in
between.
example:
select sum(key) as sum from src group by
concat(key,value,value,value,value,value,value,value,value,value) order by sum
limit 100
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]