kasakrisz commented on code in PR #4471:
URL: https://github.com/apache/hive/pull/4471#discussion_r1268945654
##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveSortJoinReduceRule.java:
##########
@@ -111,10 +113,17 @@ public void onMatch(RelOptRuleCall call) {
RelNode inputLeft = join.getLeft();
RelNode inputRight = join.getRight();
+ final RexBuilder rexBuilder = sortLimit.getCluster().getRexBuilder();
+ // We have to retain 0 ~ offset + limit because each task might not access
the global offset
Review Comment:
What does `0 ~ offset + limit` means?
Is it `Each task has to retain offset + limit records because ...` ?
##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveSortJoinReduceRule.java:
##########
@@ -111,10 +113,17 @@ public void onMatch(RelOptRuleCall call) {
RelNode inputLeft = join.getLeft();
RelNode inputRight = join.getRight();
+ final RexBuilder rexBuilder = sortLimit.getCluster().getRexBuilder();
+ // We have to retain 0 ~ offset + limit because each task might not access
the global offset
+ final RexNode inputOffset =
rexBuilder.makeExactLiteral(BigDecimal.valueOf(0));
+ final int offset = sortLimit.offset == null ? 0 :
RexLiteral.intValue(sortLimit.offset);
+ final int limit = RexLiteral.intValue(sortLimit.fetch);
+ final RexNode inputLimit =
rexBuilder.makeExactLiteral(BigDecimal.valueOf(offset + limit));
Review Comment:
Could you please add this comment to the javadoc of this rule.
```
LIMIT l OFFSET o without ORDER BY is semantically identical to LIMIT l when
o + l <= # of rows. However, in a distributed environment, input tasks can't
verify o + l <= # of rows on each task. So, this pushes o + l.
```
Replace `m` and `n` with `o` and `l`.
Please mention that in a distributed env each task is processing only a
subset of the data.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]