okumin commented on code in PR #4471:
URL: https://github.com/apache/hive/pull/4471#discussion_r1260845442
##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveSortJoinReduceRule.java:
##########
@@ -111,10 +113,17 @@ public void onMatch(RelOptRuleCall call) {
RelNode inputLeft = join.getLeft();
RelNode inputRight = join.getRight();
+ final RexBuilder rexBuilder = sortLimit.getCluster().getRexBuilder();
+ // We have to retain 0 ~ offset + limit because each task might not access
the global offset
+ final RexNode inputOffset =
rexBuilder.makeExactLiteral(BigDecimal.valueOf(0));
+ final int offset = sortLimit.offset == null ? 0 :
RexLiteral.intValue(sortLimit.offset);
+ final int limit = RexLiteral.intValue(sortLimit.fetch);
+ final RexNode inputLimit =
rexBuilder.makeExactLiteral(BigDecimal.valueOf(offset + limit));
Review Comment:
`LIMIT n OFFSET m` without `ORDER BY` is semantically identical to `LIMIT n`
when `m + n <= # of rows`. However, in a distributed environment, input tasks
can't verify `m + n <= # of rows` on each task. So, this pushes `m + n`.
I know an issue still exists when the join node is executed in parallel. It
can happen even without CBO, and I will work on it in
https://issues.apache.org/jira/browse/HIVE-27480.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]