github-actions[bot] commented on code in PR #63763:
URL: https://github.com/apache/doris/pull/63763#discussion_r3449564519
##########
regression-test/data/nereids_rules_p0/subquery_to_window_function/correlated_scalar_subquery_to_window_function.out:
##########
@@ -0,0 +1,13 @@
+-- This file is automatically generated. You should know what you did if you
want to edit this
+-- !d26072 --
+10 2 1 7
+11 2 1 7
+20 4 2 10
+30 5 3 8
+
+-- !d26072_no_change --
+2 1 7
+2 1 7
+4 2 10
+5 3 8
+
Review Comment:
`git diff --check` fails on this generated output with `new blank line at
EOF` because the file has an extra empty line after the last expected row.
Please regenerate or trim the `.out` so it ends after the last expected row
with only the normal final newline.
##########
fe/fe-core/src/main/java/org/apache/doris/nereids/rules/rewrite/AggScalarSubQueryToWindowFunction.java:
##########
@@ -362,9 +445,66 @@ private Plan rewrite(LogicalFilter<? extends Plan> filter,
LogicalApply<Plan, Pl
windowFilterConjunct = ExpressionUtils.replace(windowFilterConjunct,
ImmutableMap.of(aggOut.toSlot(), aggOutExpr));
- LogicalFilter<Plan> newFilter =
filter.withConjunctsAndChild(conjuncts.get(true), apply.left());
+ // Split uncorrelated conjuncts: predicates that reference ONLY shared
+ // relation slots (tables appearing in both outer and inner plans) must
+ // stay ABOVE the window. Otherwise the window function would see a
+ // different set of rows than the original scalar subquery.
+ //
+ // For example, with fact(f) as shared table and dim(d) as outer-only:
+ // f.v > 6 → shared-only → must stay above the window
+ // d.tag > 0 → outer-only → safe below the window
+ // f.k = d.k → join cond → needed below the window
+ //
+ // We find shared tables by comparing table IDs that appear in both
+ // outer and inner plans, then collect ALL output slots of those
+ // tables (not just columns referenced in the inner query).
+ List<CatalogRelation> outerRels = outerPlans.stream()
+ .filter(CatalogRelation.class::isInstance)
+ .map(CatalogRelation.class::cast)
+ .collect(Collectors.toList());
+ List<CatalogRelation> innerRels = innerPlans.stream()
+ .filter(CatalogRelation.class::isInstance)
+ .map(CatalogRelation.class::cast)
+ .collect(Collectors.toList());
+ Set<Long> innerTableIds = innerRels.stream()
+ .map(r -> r.getTable().getId())
+ .collect(Collectors.toSet());
+ Set<ExprId> sharedOuterExprIds = outerRels.stream()
+ .filter(r -> innerTableIds.contains(r.getTable().getId()))
+ .flatMap(r -> r.getOutput().stream())
+ .map(Slot::getExprId)
+ .collect(Collectors.toSet());
+ Set<Expression> uncorrelatedConjuncts = conjuncts.get(true);
+ Set<Expression> belowWindowConjuncts = Sets.newHashSet();
+ Set<Expression> aboveWindowConjuncts = Sets.newHashSet();
+ if (uncorrelatedConjuncts != null) {
+ for (Expression conj : uncorrelatedConjuncts) {
+ boolean hasShared = false;
+ boolean hasNonShared = false;
+ for (ExprId id : conj.getInputSlotExprIds()) {
+ if (sharedOuterExprIds.contains(id)) {
+ hasShared = true;
+ } else {
+ hasNonShared = true;
+ }
+ }
+ if (hasShared && !hasNonShared) {
Review Comment:
This split also moves shared-table predicates that were part of the inner
subquery filter above the window. `checkFilter` proves that every inner
conjunct exists in the outer filter after slot replacement, but it does not
remember which outer conjuncts were matched. For a matching shape
like:\n\n```text\nFilter(f.k = d.k, f.v < 10, f.v * 2 > sum_alias)\n
Apply(correlation: d.k)\n CrossJoin\n Scan fact f\n Scan dim d --
unique on k\n Aggregate(sum(f2.v) AS sum_alias)\n Filter(f2.k = d.k,
f2.v < 10)\n Scan fact f2\n```\n\n`f2.v < 10` is a real input filter for
the scalar subquery aggregate. After replacement it is `f.v < 10`, but this
code classifies it as shared-only and puts it in `aboveWindowConjuncts`,
leaving the window to compute `SUM(f.v) OVER (PARTITION BY d.k)` over all fact
rows. With fact values `5` and `100` for the same key, the original subquery
sum for the surviving `v=5` row is `5`, while the rewrite uses `105` and can
filter the row inc
orrectly.\n\nPlease track the inner conjuncts matched by `checkFilter` and
force those below the window, or reject cases where shared-table predicates
cannot be separated into matched inner filters versus extra outer filters.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]