silundong commented on code in PR #4691:
URL: https://github.com/apache/calcite/pull/4691#discussion_r2635417739


##########
core/src/main/java/org/apache/calcite/sql2rel/RelFieldTrimmer.java:
##########
@@ -673,7 +673,7 @@ public TrimResult trimFields(
 
     // Build new filter with trimmed input and condition.
     relBuilder.push(newInput)
-        .filter(filter.getVariablesSet(), newConditionExpr);
+        .filter(correlationIds, newConditionExpr);

Review Comment:
   Correlated subqueries in join condition are a tricky issue (the case in 
[CALCITE-7274](https://issues.apache.org/jira/browse/CALCITE-7274)). Here's my 
understanding:
   
   According to the logic in `SqlToRelConverter` and `LogicalJoin`, the 
`variablesSet` of `LogicalJoin` describes the variables coming from the left 
input that are referenced in the right input. This is also confirmed by the 
logic of `RelBuilder` constructing a join: `RelBuilder.join` will produce a 
`Correlate` (not a `Join`) when it receives a non-empty `variablesSet`.
   
   However, if a subquery in the join condition uses a correlated variable 
(call it `cor0`) from the current scope, that `cor0` is intended to represent 
the concatenation of the left and right rows, not only variable originating 
from the left input. That interpretation does not match the `variablesSet` 
concept in `LogicalJoin`, so the CorrelationId used by the subquery is not 
added to `variablesSet`. This explains why `LogicalJoin.variablesSet` is always 
empty even though the join condition contains correlated variables from the 
current scope. In other words, we are forced to treat free variables used by 
subqueries in join condition as coming from the current scope rather than from 
outer scopes, which also causes problems with removing nested correlated 
subqueries and decorrelating them.
   
   Back to this PR: based on the above, perhaps we should retain the original 
logic. For `Filter`, its `variablesSet` is correct in the first place; for 
`Join`, filling the `variablesSet` with the correlated variables used in the 
subquery of the condition contradicts the definition of 
`LogicalJoin.variablesSet`.
   
   This is my personal understanding and suggestion for your reference.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to