Re: [PR] HIVE-29322: Avoid TopNKeyOperator When ReduceSink TopNkey Filtering Provides Better Pruning for ORDER BY LIMIT Queries [hive]

via GitHub Fri, 09 Jan 2026 03:34:05 -0800


zabetak commented on code in PR #6202:
URL: https://github.com/apache/hive/pull/6202#discussion_r2675850724



##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/topnkey/TopNKeyProcessor.java:
##########
@@ -111,6 +123,50 @@ public Object process(Node nd, Stack<Node> stack, 
NodeProcessorCtx procCtx,
     return null;
   }
 
+  /**
+   * Returns true if the ReduceSink is only under an ORDER BY + LIMIT plan
+   * and has no GroupBy or Join operators in its upstream ancestry.
+   * This is used to disable TopNKey for pure ORDER BY LIMIT queries where
+   * LIMIT pushdown must take precedence.
+   */
+  public static boolean isOrderByLimitPath(ReduceSinkOperator rs) {

Review Comment:
   I understand that performance issue was observed on rather simple queries 
but the problem seems more general than that and goes beyond the SQL syntax. 
Since the PR changes how TopN optimizations are performed its important to 
understand the performance impact for interleaved operators in order to come up 
with a complete solution.
   
   The PR essentially redefines when the TopNKeyOptimization should be 
performed so we cannot really say that PTF operators are out-of-scope since 
there are code changes that explicitly target those.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] HIVE-29322: Avoid TopNKeyOperator When ReduceSink TopNkey Filtering Provides Better Pruning for ORDER BY LIMIT Queries [hive]

Reply via email to