[GitHub] [hive] okumin commented on a diff in pull request #4471: HIVE-27484: Limit pushdown with offset generate wrong results

via GitHub Wed, 12 Jul 2023 03:28:08 -0700


okumin commented on code in PR #4471:
URL: https://github.com/apache/hive/pull/4471#discussion_r1260845442



##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveSortJoinReduceRule.java:
##########
@@ -111,10 +113,17 @@ public void onMatch(RelOptRuleCall call) {
     RelNode inputLeft = join.getLeft();
     RelNode inputRight = join.getRight();
 
+    final RexBuilder rexBuilder = sortLimit.getCluster().getRexBuilder();
+    // We have to retain 0 ~ offset + limit because each task might not access 
the global offset
+    final RexNode inputOffset = 
rexBuilder.makeExactLiteral(BigDecimal.valueOf(0));
+    final int offset = sortLimit.offset == null ? 0 : 
RexLiteral.intValue(sortLimit.offset);
+    final int limit = RexLiteral.intValue(sortLimit.fetch);
+    final RexNode inputLimit = 
rexBuilder.makeExactLiteral(BigDecimal.valueOf(offset + limit));

Review Comment:
   `LIMIT n OFFSET m` without `ORDER BY` is semantically identical to `LIMIT n` 
when `m + n <= # of rows`. However, in a distributed environment, input tasks 
can't verify `m + n <= # of rows` on each task. So, this pushes `m + n`.
   I know an issue still exists when the join node is executed in parallel. It 
can happen even without CBO, and I will work on it in 
https://issues.apache.org/jira/browse/HIVE-27480.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [hive] okumin commented on a diff in pull request #4471: HIVE-27484: Limit pushdown with offset generate wrong results

Reply via email to