[
https://issues.apache.org/jira/browse/CALCITE-7319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18044088#comment-18044088
]
Zhen Chen commented on CALCITE-7319:
------------------------------------
Adding the following test case to RelToSqlConverterTest can reproduce this
problem.
{code:java}
@Test void testFilterIntoJoinMissingVariableCor() {
final String sql = "SELECT E.EMPNO\n"
+ "FROM EMP E\n"
+ "JOIN DEPT D ON E.DEPTNO = D.DEPTNO\n"
+ "WHERE D.DEPTNO = (\n"
+ " SELECT MIN(D_INNER.DEPTNO)\n"
+ " FROM DEPT D_INNER\n"
+ " WHERE D_INNER.DEPTNO = E.DEPTNO)";
final String expected = "SELECT \"EMP\".\"EMPNO\"\n"
+ "FROM \"SCOTT\".\"EMP\"\n"
+ "INNER JOIN \"SCOTT\".\"DEPT\" ON \"EMP\".\"DEPTNO\" =
\"DEPT\".\"DEPTNO\"\n"
+ "WHERE \"DEPT\".\"DEPTNO\" = (((SELECT MIN(\"DEPTNO\")\n"
+ "FROM \"SCOTT\".\"DEPT\"\n"
+ "WHERE \"DEPTNO\" = \"EMP\".\"DEPTNO\")))"; HepProgramBuilder
builder = new HepProgramBuilder();
builder.addRuleClass(FilterJoinRule.FilterIntoJoinRule.class);
HepPlanner hepPlanner = new HepPlanner(builder.build());
RuleSet rules =
RuleSets.ofList(CoreRules.FILTER_INTO_JOIN); sql(sql)
.schema(CalciteAssert.SchemaSpec.JDBC_SCOTT)
.withCalcite()
.optimize(rules, hepPlanner)
.ok(expected);
} {code}
Should we prohibit pushing down FILTERs with correlated subqueries to the
children of a JOIN?
> FILTER_INTO_JOIN Calcite HepPlanner rules loose variable context for
> subqueries
> -------------------------------------------------------------------------------
>
> Key: CALCITE-7319
> URL: https://issues.apache.org/jira/browse/CALCITE-7319
> Project: Calcite
> Issue Type: Bug
> Components: core
> Affects Versions: 1.41.0
> Environment: Java
> Reporter: Matthew White
> Priority: Major
>
> FILTER_INTO_JOIN Calcite HepPlanner rules loose variable context for
> subqueries
> For queries that involve a subplan and therefore correlation variables,
> applying some HepPlanner rules means
> that the correlation variables. (The variable set in the nodes) is lost.
> As an example start with this SQL query with a subquery.
> ```sql
> final String sql = """
> SELECT
> P.P_PARTKEY
> FROM
> PART P
> JOIN PARTSUPP PS ON P.P_PARTKEY = PS.PS_PARTKEY
> WHERE
> PS.PS_SUPPLYCOST = (
> SELECT
> MIN(PS_INNER.PS_SUPPLYCOST)
> FROM
> PARTSUPP PS_INNER
> WHERE
> PS_INNER.PS_PARTKEY = P.P_PARTKEY
> )
> """;
> ```
> When this is parsed and converted to Calcite RelNodes, there is a reference
> to `$cor0.P_PARTKEY` and the `LogicalFilter` has a varaibleSet present. The
> 'explain()' format of the nodes gives:
> ```
> LogicalProject(P_PARTKEY=[$0])
> LogicalFilter(condition=[=($12, $SCALAR_QUERY({
> LogicalAggregate(group=[{}], EXPR$0=[MIN($0)])
> LogicalProject(PS_SUPPLYCOST=[$3])
> LogicalFilter(condition=[=($0, $cor0.P_PARTKEY)])
> LogicalTableScan(table=[[TPCH, PARTSUPP]])
> }))], variablesSet=[[$cor0]])
> LogicalJoin(condition=[=($0, $9)], joinType=[inner])
> LogicalTableScan(table=[[TPCH, PART]])
> LogicalTableScan(table=[[TPCH, PARTSUPP]])
> ```
> Applying the `FILTER_INTO_JOIN` rule. (we believe that other rules can also
> have the same effect) result in an optimized query, with the calcite nodes
> being as follows
> ```
> LogicalProject(P_PARTKEY=[$0])
> LogicalJoin(condition=[=($0, $9)], joinType=[inner])
> LogicalTableScan(table=[[TPCH, PART]])
> LogicalFilter(condition=[=($3, $SCALAR_QUERY({
> LogicalAggregate(group=[{}], EXPR$0=[MIN($0)])
> LogicalProject(PS_SUPPLYCOST=[$3])
> LogicalFilter(condition=[=($0, $cor0.P_PARTKEY)])
> LogicalTableScan(table=[[TPCH, PARTSUPP]])
> }))])
> LogicalTableScan(table=[[TPCH, PARTSUPP]])
> ```
> Note that the `LogicalFilter` still has the condition `[=($0,
> $cor0.P_PARTKEY)]` - but no variableset present on any node.
> Attempting to convert this back into SQL via Calcite gives
> ```
> java.lang.NullPointerException: variable $cor0 is not found
> ```
> The complete code is available in this [github
> gist](https://gist.github.com/mbwhite/4899dfd959839a6eedd6ee1542c9fb08)
--
This message was sent by Atlassian Jira
(v8.20.10#820010)