github-actions[bot] commented on code in PR #63763:
URL: https://github.com/apache/doris/pull/63763#discussion_r3433910962


##########
fe/fe-core/src/main/java/org/apache/doris/nereids/rules/rewrite/AggScalarSubQueryToWindowFunction.java:
##########
@@ -278,6 +289,19 @@ private boolean checkRelation(List<Slot> correlatedSlots) {
                 .filter(node -> outerIds.contains(node.getTable().getId()))
                 .map(LogicalRelation.class::cast)
                 
.map(LogicalRelation::getOutputExprIdSet).flatMap(Collection::stream).collect(Collectors.toSet());
+        partitionBySlots.addAll(apply.left().getOutput().stream()
+                .filter(slot -> 
correlatedRelationOutput.contains(slot.getExprId()))
+                .collect(Collectors.toList()));
+        if (partitionBySlots.isEmpty()) {
+            return false;

Review Comment:
   This still is not enough to preserve row identity for duplicate-preserving 
tables. `DUPLICATE KEY` tables can contain two rows with identical values for 
every slot in `correlatedRelationOutput`, so this equality check passes even 
though the window partition cannot distinguish those two outer rows. For 
example, if the new regression adds another identical `dim` row `(10, 1, 1)`, 
the original scalar subquery evaluates `SUM(f2.v)` once per `dim` row and keeps 
two `did=10`/`f.id=2` joined rows. After this rewrite, both identical `dim` 
rows share one `(did, k, tag)` partition, each joined `fact` row appears twice 
in that partition, and `SUM(v)` becomes `24` instead of `12`, so those rows are 
filtered out. This is distinct from the previous pruned-output thread because 
here `partitionBySlots` already equals `correlatedRelationOutput`; the missing 
requirement is a proven unique row identity for the outer-only relation, or the 
rule should return false for duplicate-preserving relations.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to