silundong commented on code in PR #4840:
URL: https://github.com/apache/calcite/pull/4840#discussion_r3076902238


##########
core/src/main/java/org/apache/calcite/plan/RelOptUtil.java:
##########
@@ -2934,15 +2934,40 @@ public static boolean classifyFilters(
     ImmutableBitSet rightBitmap =
         ImmutableBitSet.range(nSysFields + nFieldsLeft, nTotalFields);
 
+    // Correlation variables introduced by this join itself: i.e. ids whose
+    // binding is established by *this* join (the left input is the producer,
+    // the right input is the consumer). A predicate that references any of
+    // these cannot be pushed to the LEFT input -- the left subtree has no
+    // binder for the variable above it, so the reference would be stranded.
+    // Pushing such a predicate to the RIGHT input is still safe (the right
+    // subtree is the natural consumer of the binding established by this
+    // join), and it can of course stay on the join itself.

Review Comment:
   Based on past community discussions and the current implementation, the 
`CorrelationId` in `variablesSet` of `Join` represents the concatenation of the 
left and right rows; the `CorrelationId` in `Correlate` represents rows that 
are produced by the left side and consumed by the right side. 
   
   To avoid confusion, the comment here may need to be adjusted. The core 
summary of our previous discussion is: pushing down correlated subqueries 
carries risks and involves extremely complex logic, and therefore pushing down 
is prohibited.



##########
core/src/main/java/org/apache/calcite/plan/RelOptUtil.java:
##########
@@ -4752,6 +4837,24 @@ public ImmutableBitSet build() {
       }
       return super.visitCall(call);
     }
+
+    @Override public Void visitSubQuery(RexSubQuery subQuery) {
+      final Set<CorrelationId> variablesSet = 
RelOptUtil.getVariablesUsed(subQuery.rel);

Review Comment:
   We only care about `localCorrelationIds`, so if `localCorrelationIds` is 
null, return immediately; otherwise, iterate over `localCorrelationIds` and 
call `RelOptUtil.correlationColumns`.
    `RelOptUtil.getVariablesUsed(subQuery.rel)` is unnecessary.



##########
core/src/main/java/org/apache/calcite/plan/RelOptUtil.java:
##########
@@ -4678,12 +4735,24 @@ public RexCorrelVariableMapShuttle(final CorrelationId 
correlationId,
   public static class InputFinder extends RexVisitorImpl<Void> {
     private final ImmutableBitSet.Builder bitBuilder;
     private final @Nullable Set<RelDataTypeField> extraFields;
+    /** Correlation ids that are considered "local" when analysing
+     * sub-queries: only these contribute bits via {@link #visitSubQuery}.
+     * If null, all correlation ids encountered inside a sub-query

Review Comment:
   If null, `visitSubQuery` should return immediately.



##########
core/src/main/java/org/apache/calcite/plan/RelOptUtil.java:
##########
@@ -3000,6 +3025,38 @@ public static boolean classifyFilters(
     return !filtersToRemove.isEmpty();
   }
 
+  /**
+   * Returns whether {@code node} references any {@link CorrelationId} in
+   * {@code ids}, either directly via a {@link RexCorrelVariable} (typically
+   * reached through a {@link org.apache.calcite.rex.RexFieldAccess}) or
+   * transitively via the inner plan of a {@link RexSubQuery}.
+   */
+  private static boolean referencesAnyCorrelation(RexNode node,

Review Comment:
   The purpose of `RexUtil.CorrelationFinder` is similar to this. Would you 
consider extending it? That would reuse and strengthen the existing capability.



##########
core/src/main/java/org/apache/calcite/rel/rules/FilterJoinRule.java:
##########
@@ -198,14 +202,42 @@ protected void perform(RelOptRuleCall call, @Nullable 
Filter filter,
       return;
     }
 
+    // Collect the correlation variables (introduced via sub-queries) 
referenced
+    // by each bucket of predicates after classifyFilters has decided where 
every
+    // conjunct lives. The buckets are needed to (a) plumb variablesSet onto 
the
+    // newly-created child Filters and (b) recompute the join's own 
variablesSet
+    // without dropping ids that the surviving join condition still references.
+    final Set<CorrelationId> leftVariablesSet = new LinkedHashSet<>();

Review Comment:
   Maybe the changes in the `RelOptUtil` are sufficient to meet our needs; no 
other files need to be modified.



##########
core/src/main/java/org/apache/calcite/plan/RelOptUtil.java:
##########
@@ -2934,15 +2934,40 @@ public static boolean classifyFilters(
     ImmutableBitSet rightBitmap =
         ImmutableBitSet.range(nSysFields + nFieldsLeft, nTotalFields);
 
+    // Correlation variables introduced by this join itself: i.e. ids whose
+    // binding is established by *this* join (the left input is the producer,
+    // the right input is the consumer). A predicate that references any of
+    // these cannot be pushed to the LEFT input -- the left subtree has no
+    // binder for the variable above it, so the reference would be stranded.
+    // Pushing such a predicate to the RIGHT input is still safe (the right
+    // subtree is the natural consumer of the binding established by this
+    // join), and it can of course stay on the join itself.
+    final Set<CorrelationId> joinCorrelationIds = joinRel instanceof Join
+        ? joinRel.getVariablesSet()
+        : ImmutableSet.of();
+
     final List<RexNode> filtersToRemove = new ArrayList<>();
     for (RexNode filter : filters) {
-      final InputFinder inputFinder = InputFinder.analyze(filter);
+
+      // Only consider correlation ids bound by *this* join when computing
+      // the input bitmap of a sub-query inside the predicate. Foreign
+      // correlation ids are bound by an outer scope and their
+      // correlationColumns indices would otherwise alias onto unrelated
+      // columns of this join's row type, mis-classifying the predicate.
+      final InputFinder inputFinder = InputFinder.analyze(filter, 
joinCorrelationIds);
       final ImmutableBitSet inputBits = inputFinder.build();
 
+      // Disable left-push only for filters that reference a CorrelationId
+      // bound by this join.

Review Comment:
   Disable pushing down to both, not just to the left side.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to