yashlimbad commented on code in PR #4840:
URL: https://github.com/apache/calcite/pull/4840#discussion_r3044093196


##########
core/src/main/java/org/apache/calcite/plan/RelOptUtil.java:
##########
@@ -4752,6 +4759,17 @@ public ImmutableBitSet build() {
       }
       return super.visitCall(call);
     }
+
+    @Override public Void visitSubQuery(RexSubQuery subQuery) {

Review Comment:
   @silundong Thanks for the detailed review — I genuinely appreciate the time 
you've put into thinking through the edge cases. I want to address the 
outer-scope CorrelationId concern specifically, because I've debugged this 
end-to-end and I believe the existing Calcite infrastructure already prevents 
the scenario you're worried about. Let me walk through it step by step.
   
   ---
   
   **1. `getVariablesUsed` already strips out CorrelationIds owned by inner 
RelNodes**
   
   Your concern was: if a subquery references both `cor1` (from an outer scope) 
and `cor2` (from the current scope), `getVariablesUsed` might return both, and 
we'd incorrectly adjust indices for a CorrelationId that doesn't belong to us.
   
   But if you look at how `VariableUsedVisitor` works inside 
`RelOptUtil.getVariablesUsed`, there's a critical line:
   
   ```java
   vuv.variables.removeAll(other.getVariablesSet())
   ```
   
   What this does is: as the visitor walks the RelNode tree, every time it 
encounters a RelNode that *defines* (owns) a CorrelationId via its 
`getVariablesSet()`, that id gets **removed** from the collected set. So by the 
time `getVariablesUsed` returns, the set only contains CorrelationIds that are 
truly **free variables** — meaning they come from an outer scope and are not 
defined by any RelNode within the subtree being analyzed.
   
   This is exactly the set we care about when adjusting indices during filter 
pushdown. We're not accidentally picking up CorrelationIds that belong to inner 
scopes — they've already been pruned.
   
   ---
   
   **2. `correlationColumns` is scoped to a specific CorrelationId — no 
cross-contamination**
   
   Even beyond `getVariablesUsed`, the actual index collection is done through 
`correlationColumns`, which takes a **specific** `CorrelationId` as a parameter:
   
   ```java
   /** Finds which columns of a correlation variable are used within a
    * relational expression. */
   public static ImmutableBitSet correlationColumns(CorrelationId id,
       RelNode rel) {
     final CorrelationCollector collector = new CorrelationCollector();
     rel.accept(collector);
     final ImmutableBitSet.Builder builder = ImmutableBitSet.builder();
     for (int field : collector.vuv.variableFields.get(id)) {
       if (field >= 0) {
         builder.set(field);
       }
     }
     return builder.build();
   }
   ```
   
   Notice `collector.vuv.variableFields.get(id)` — this is a lookup **keyed by 
the specific CorrelationId** we pass in. The `variableFields` is a multimap 
that tracks which field indices are accessed for *each* CorrelationId 
separately. So even if the subtree contains references to `cor1`, `cor2`, and 
`cor3`, calling `correlationColumns(cor2, rel)` will only return the fields 
accessed through `cor2`. There is zero risk of `cor1`'s field indices bleeding 
into `cor2`'s result.
   
   ---
   
   **Putting it together with your example:**
   
   You described: `Filter(condition=[subquery], variablesSet=[cor2])`, where 
the subquery has two free variables — `cor2.id` and `cor1.name` (with `cor1` 
from an outer scope).
   
   - `getVariablesUsed` on the subquery's rel would return `{cor1, cor2}` as 
free variables (inner-defined ones already stripped out).
   - But when we call `correlationColumns(cor2, subQuery.rel)`, we only get the 
field index for `cor2.id` — not `cor1.name`.
   - `cor1.name` is completely ignored in the adjustment because the lookup is 
scoped to `cor2`.
   
   So the scenario where we accidentally adjust `cor1.name`'s index thinking it 
belongs to `cor2` — that simply cannot happen with the current code. The 
infrastructure was designed with exactly this kind of scoping in mind.
   
   ---
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to