silundong commented on code in PR #4754:
URL: https://github.com/apache/calcite/pull/4754#discussion_r2702403825
##########
core/src/test/resources/sql/new-decorr.iq:
##########
@@ -42,4 +42,29 @@ SELECT * FROM t0 WHERE t0a <
!ok
+# [CALCITE-7382] The TopDownGeneralDecorrelator returns an error result when a
subquery contains a LIMIT 1
+!use scott
+SELECT dname, (SELECT emp.comm FROM "scott".emp where dept.deptno = emp.deptno
ORDER BY emp.comm limit 1) FROM "scott".dept;
++------------+--------+
+| DNAME | EXPR$1 |
++------------+--------+
+| ACCOUNTING | |
+| OPERATIONS | |
+| RESEARCH | |
+| SALES | 0.00 |
++------------+--------+
+(4 rows)
+
+!ok
+
+select dname, (select empno from emp where dept.deptno = emp.deptno limit 1)
from dept where deptno = 10;
Review Comment:
This test is intended to demonstrate that `Sort` needs to be rewritten as a
`[Filter-]Window` regardless of whether it has an `ORDER BY` or `LIMIT/OFFSET`.
As I updated in the comment
> because now the order/limit/offset has to be enforced per value of the
outer bindings instead of globally.
This is exactly what @asolimando was concerned about in his comment below.
Perhaps in practical applications there's almost no need to write it this way;
can I call this a edge case?
##########
core/src/main/java/org/apache/calcite/sql2rel/TopDownGeneralDecorrelator.java:
##########
@@ -598,50 +599,48 @@ public RelNode unnestInternal(Sort sort, boolean
allowEmptyOutputFromRewrite) {
RelCollation shiftCollation = sort.getCollation().apply(targetMapping);
builder.push(newInput);
- if (!sort.collation.getFieldCollations().isEmpty()
- && (sort.offset != null || sort.fetch != null)) {
- // the Sort with ORDER BY and LIMIT or OFFSET have to be changed during
rewriting because
- // now the limit has to be enforced per value of the outer bindings
instead of globally.
- // It can be rewritten using ROW_NUMBER() window function and filtering
on it,
- // see section 4.4 in paper Improving Unnesting of Complex Queries
- List<RexNode> partitionKeys = new ArrayList<>();
- for (CorDef corDef : corDefs) {
- int partitionKeyIndex =
requireNonNull(inputInfo.corDefOutputs.get(corDef));
- partitionKeys.add(builder.field(partitionKeyIndex));
- }
- RexNode rowNumber = builder.aggregateCall(SqlStdOperatorTable.ROW_NUMBER)
- .over()
- .partitionBy(partitionKeys)
- .orderBy(builder.fields(shiftCollation))
- .toRex();
- List<RexNode> projectsWithRowNumber = new ArrayList<>(builder.fields());
- projectsWithRowNumber.add(rowNumber);
- builder.project(projectsWithRowNumber);
-
- List<RexNode> conditions = new ArrayList<>();
- if (sort.offset != null) {
- RexNode greaterThenLowerBound =
- builder.call(
- SqlStdOperatorTable.GREATER_THAN,
- builder.field(projectsWithRowNumber.size() - 1),
- sort.offset);
- conditions.add(greaterThenLowerBound);
- }
- if (sort.fetch != null) {
- RexNode upperBound = sort.offset == null
- ? sort.fetch
- : builder.call(SqlStdOperatorTable.PLUS, sort.offset, sort.fetch);
- RexNode lessThenOrEqualUpperBound =
- builder.call(
- SqlStdOperatorTable.LESS_THAN_OR_EQUAL,
- builder.field(projectsWithRowNumber.size() - 1),
- upperBound);
- conditions.add(lessThenOrEqualUpperBound);
- }
- builder.filter(conditions);
- } else {
- builder.sortLimit(sort.offset, sort.fetch,
builder.fields(shiftCollation));
+ // the Sort have to be changed during rewriting because now the
order/limit/offset has to be
Review Comment:
The paper's original wording is:
> Subqueries with ORDER BY and LIMIT or OFFSET have to be changed....
It doesn't say what to do when only LIMIT or OFFSET is present. However, I
believe that even with only LIMIT or OFFSET, it should be rewritten as a window
function without ORDER clause.
This matches the test case that @rubenada mentioned in his comment. I
performed the same test in the [umbra-db
interface](https://umbra-db.com/interface/), and it indeed rewrite the LIMIT as
a window function. It seems our approach aligns with the paper's intent.
<img width="3840" height="1907" alt="image"
src="https://github.com/user-attachments/assets/3cc2e127-20f7-4292-a030-ed8d4bf60e30"
/>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]