yashlimbad commented on code in PR #4840:
URL: https://github.com/apache/calcite/pull/4840#discussion_r3044093196
##########
core/src/main/java/org/apache/calcite/plan/RelOptUtil.java:
##########
@@ -4752,6 +4759,17 @@ public ImmutableBitSet build() {
}
return super.visitCall(call);
}
+
+ @Override public Void visitSubQuery(RexSubQuery subQuery) {
Review Comment:
@silundong Thanks for the detailed review — I genuinely appreciate the time
you've put into thinking through the edge cases. I want to address the
outer-scope CorrelationId concern specifically, because I've debugged this
end-to-end and I believe the existing Calcite infrastructure already prevents
the scenario you're worried about. Let me walk through it step by step.
---
**1. `getVariablesUsed` already strips out CorrelationIds owned by inner
RelNodes**
Your concern was: if a subquery references both `cor1` (from an outer scope)
and `cor2` (from the current scope), `getVariablesUsed` might return both, and
we'd incorrectly adjust indices for a CorrelationId that doesn't belong to us.
But if you look at how `VariableUsedVisitor` works inside
`RelOptUtil.getVariablesUsed`, there's a critical line:
```java
vuv.variables.removeAll(other.getVariablesSet())
```
What this does is: as the visitor walks the RelNode tree, every time it
encounters a RelNode that *defines* (owns) a CorrelationId via its
`getVariablesSet()`, that id gets **removed** from the collected set. So by the
time `getVariablesUsed` returns, the set only contains CorrelationIds that are
truly **free variables** — meaning they come from an outer scope and are not
defined by any RelNode within the subtree being analyzed.
This is exactly the set we care about when adjusting indices during filter
pushdown. We're not accidentally picking up CorrelationIds that belong to inner
scopes — they've already been pruned.
---
**2. `correlationColumns` is scoped to a specific CorrelationId — no
cross-contamination**
Even beyond `getVariablesUsed`, the actual index collection is done through
`correlationColumns`, which takes a **specific** `CorrelationId` as a parameter:
```java
/** Finds which columns of a correlation variable are used within a
* relational expression. */
public static ImmutableBitSet correlationColumns(CorrelationId id,
RelNode rel) {
final CorrelationCollector collector = new CorrelationCollector();
rel.accept(collector);
final ImmutableBitSet.Builder builder = ImmutableBitSet.builder();
for (int field : collector.vuv.variableFields.get(id)) {
if (field >= 0) {
builder.set(field);
}
}
return builder.build();
}
```
Notice `collector.vuv.variableFields.get(id)` — this is a lookup **keyed by
the specific CorrelationId** we pass in. The `variableFields` is a multimap
that tracks which field indices are accessed for *each* CorrelationId
separately. So even if the subtree contains references to `cor1`, `cor2`, and
`cor3`, calling `correlationColumns(cor2, rel)` will only return the fields
accessed through `cor2`. There is zero risk of `cor1`'s field indices bleeding
into `cor2`'s result.
---
**Putting it together with your example:**
You described: `Filter(condition=[subquery], variablesSet=[cor2])`, where
the subquery has two free variables — `cor2.id` and `cor1.name` (with `cor1`
from an outer scope).
- `getVariablesUsed` on the subquery's rel would return `{cor1, cor2}` as
free variables (inner-defined ones already stripped out).
- But when we call `correlationColumns(cor2, subQuery.rel)`, we only get the
field index for `cor2.id` — not `cor1.name`.
- `cor1.name` is completely ignored in the adjustment because the lookup is
scoped to `cor2`.
So the scenario where we accidentally adjust `cor1.name`'s index thinking it
belongs to `cor2` — that simply cannot happen with the current code. The
infrastructure was designed with exactly this kind of scoping in mind.
---
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]