github-actions[bot] commented on code in PR #63763:
URL: https://github.com/apache/doris/pull/63763#discussion_r3456641011


##########
fe/fe-core/src/main/java/org/apache/doris/nereids/rules/rewrite/AggScalarSubQueryToWindowFunction.java:
##########
@@ -362,9 +478,134 @@ private Plan rewrite(LogicalFilter<? extends Plan> 
filter, LogicalApply<Plan, Pl
         windowFilterConjunct = ExpressionUtils.replace(windowFilterConjunct,
                 ImmutableMap.of(aggOut.toSlot(), aggOutExpr));
 
-        LogicalFilter<Plan> newFilter = 
filter.withConjunctsAndChild(conjuncts.get(true), apply.left());
+        // Split uncorrelated conjuncts: predicates that reference ONLY shared
+        // relation slots (tables appearing in both outer and inner plans) must
+        // stay ABOVE the window. Otherwise the window function would see a
+        // different set of rows than the original scalar subquery.
+        //
+        // For example, with fact(f) as shared table and dim(d) as outer-only:
+        //   f.v > 6        → shared-only → must stay above the window
+        //   d.tag > 0      → outer-only  → safe below the window
+        //   f.k = d.k      → join cond   → needed below the window
+        //
+        // We find shared tables by comparing table IDs that appear in both
+        // outer and inner plans, then collect ALL output slots of those
+        // tables (not just columns referenced in the inner query).
+        List<CatalogRelation> outerRels = outerPlans.stream()
+                .filter(CatalogRelation.class::isInstance)
+                .map(CatalogRelation.class::cast)
+                .collect(Collectors.toList());
+        List<CatalogRelation> innerRels = innerPlans.stream()
+                .filter(CatalogRelation.class::isInstance)
+                .map(CatalogRelation.class::cast)
+                .collect(Collectors.toList());
+        Set<Long> innerTableIds = innerRels.stream()
+                .map(r -> r.getTable().getId())
+                .collect(Collectors.toSet());
+        Set<ExprId> sharedOuterExprIds = outerRels.stream()
+                .filter(r -> innerTableIds.contains(r.getTable().getId()))
+                .flatMap(r -> r.getOutput().stream())
+                .map(Slot::getExprId)
+                .collect(Collectors.toSet());
+        Set<Expression> uncorrelatedConjuncts = conjuncts.get(true);
+        Set<Expression> belowWindowConjuncts = Sets.newHashSet();
+        Set<Expression> aboveWindowConjuncts = Sets.newHashSet();
+        if (uncorrelatedConjuncts != null) {
+            for (Expression conj : uncorrelatedConjuncts) {
+                // Conjuncts that were matched against inner subquery filter
+                // conjuncts (tracked by checkFilter) must stay BELOW the
+                // window.  They are semantically part of the inner aggregate's
+                // filter, not extra outer-only predicates.  Placing them above
+                // the window would let the window see more rows than the
+                // original scalar subquery, producing wrong aggregate results.
+                if (matchedInnerFilterConjuncts.contains(conj)) {
+                    belowWindowConjuncts.add(conj);
+                    continue;
+                }
+                // Volatile predicates (e.g. random() > 0.5) must stay ABOVE
+                // the window.  Pushing them below would let the window
+                // aggregate over a different set of rows per partition than
+                // the original scalar subquery.  PushDownFilterThroughWindow
+                // has the same hazard and explicitly rejects volatile
+                // predicates for this reason.
+                if (conj.containsVolatileExpression()) {
+                    aboveWindowConjuncts.add(conj);
+                    continue;
+                }
+                // Any predicate that references shared-table slots (tables
+                // appearing in both outer and inner plans) must stay ABOVE
+                // the window.  This handles both shared-only predicates
+                // (e.g. f.v > 6) and mixed shared+outer predicates
+                // (e.g. f.v > d.tag).  Pushing them below would restrict
+                // the rows seen by the window function, producing a
+                // different aggregate than the original scalar subquery.
+                boolean hasShared = false;
+                for (ExprId id : conj.getInputSlotExprIds()) {
+                    if (sharedOuterExprIds.contains(id)) {
+                        hasShared = true;
+                        break;
+                    }
+                }
+                if (hasShared) {
+                    aboveWindowConjuncts.add(conj);
+                } else {
+                    // No shared-table references: the predicate involves only
+                    // outer-only table columns.  Safe to place below the 
window.
+                    belowWindowConjuncts.add(conj);
+                }
+            }
+        }
+
+        // Extract and classify conjuncts from any LogicalFilter nodes nested
+        // inside the outer child (apply.left()).  Nested shared-table filters
+        // (e.g. f.v > 6 under a CrossJoin) must be hoisted ABOVE the window;
+        // otherwise the window would aggregate over a filtered subset of rows
+        // while the original scalar subquery sees all rows for the key.
+        //
+        // Nested outer-only filters (e.g. d.tag > 0) are safe below and can
+        // stay in place after the filters are stripped.
+        List<LogicalFilter<Plan>> nestedOuterFilters = apply.left()
+                .collectToList(LogicalFilter.class::isInstance);
+        for (LogicalFilter<Plan> nf : nestedOuterFilters) {
+            for (Expression conj : nf.getConjuncts()) {
+                // Matched inner-filter conjuncts always go below.
+                if (matchedInnerFilterConjuncts.contains(conj)) {
+                    belowWindowConjuncts.add(conj);
+                    continue;
+                }
+                // Volatile predicates must stay above the window.
+                if (conj.containsVolatileExpression()) {
+                    aboveWindowConjuncts.add(conj);
+                    continue;
+                }
+                boolean nfHasShared = false;
+                for (ExprId id : conj.getInputSlotExprIds()) {
+                    if (sharedOuterExprIds.contains(id)) {
+                        nfHasShared = true;
+                        break;
+                    }
+                }
+                if (nfHasShared) {
+                    // Shared-table predicate → hoist above the window.
+                    aboveWindowConjuncts.add(conj);

Review Comment:
   This still breaks when a nested filter sits below a pruning project. For 
example, `Project(f.k) -> Filter(f.v > 6) -> Scan(f(k,v))` is allowed by 
`checkProject()`, but this code strips the filter, leaves `Project(f.k)` in 
place, and then adds `f.v > 6` above the window; the top filter has a dangling 
`f.v` reference because the window child only outputs `f.k` plus the window 
alias. The same problem also happens for outer-only nested predicates: 
`Project(d.k) -> Filter(d.tag > 0) -> Scan(d(k,tag))` is stripped and `d.tag > 
0` is reinserted below the window above a child that only outputs `d.k`. Please 
reject extracting filters through projects unless all predicate input slots 
survive at the reinsertion point, or carry the needed slots through and restore 
the original projection afterward.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to