[
https://issues.apache.org/jira/browse/CALCITE-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18057001#comment-18057001
]
Zhen Chen commented on CALCITE-7411:
------------------------------------
Thank you for your quick reply! [~dongsl]
I fully agree with the point you raised. However, there's currently a piece of
information worth noting: the old algorithm was able to handle this scenario
correctly. The new approach—removing the PROJECT subquery—is essentially
similar to the old algorithm, except that it adds an extra validation step:
{{{}variablesSet.retainAll(variablesSetOfRelNode){}}}. This causes the
variables obtained via {{RelOptUtil.getVariablesUsed(e.rel)}} to be cleared
when {{variablesSet}} of the PROJECT becomes empty.
I've implemented a temporary fix: when {{variablesSetOfRelNode}} is empty, we
skip this validation step. This change aligns the new logic with the behavior
of the old algorithm. I'm wondering if you agree with this
compatibility-oriented modification. Please see the PR for details; it's a very
simple change.
> When a SCALAR_QUERY in PROJECT contains correlated variables execution fails
> using TopDownGeneralDecorrelator
> -------------------------------------------------------------------------------------------------------------
>
> Key: CALCITE-7411
> URL: https://issues.apache.org/jira/browse/CALCITE-7411
> Project: Calcite
> Issue Type: Bug
> Components: core
> Affects Versions: 1.41.0
> Reporter: Zhen Chen
> Priority: Major
> Fix For: 1.42.0
>
>
> Decorrelation fails when executing the following case in RelOptRulesTest
> {code:java}
> @Test void testTopDownGeneralDecorrelateForMeasure() {
> final String sql = "SELECT job,\n"
> + " (SELECT\n"
> + " CAST(SUM(i.sal) - SUM(COALESCE(i.comm, 0)) AS DECIMAL(10,
> 2)) / SUM(i.sal)\n"
> + " FROM emp AS i\n"
> + " WHERE i.job = e.job) AS profitMargin,\n"
> + " COUNT(*) AS \"count\"\n"
> + "FROM emp AS e\n"
> + "GROUP BY job"; sql(sql)
> .withRule(
> CoreRules.PROJECT_SUB_QUERY_TO_MARK_CORRELATE,
> CoreRules.PROJECT_MERGE,
> CoreRules.PROJECT_REMOVE)
> .withLateDecorrelate(true)
> .withTopDownGeneralDecorrelate(true)
> .check();
> } {code}
> SQL to Rel:
> {code:java}
> LogicalProject(JOB=[$0], PROFITMARGIN=[$SCALAR_QUERY({
> LogicalProject(EXPR$0=[/(CAST(-($0, $1)):DECIMAL(10, 2), $0)])
> LogicalAggregate(group=[{}], agg#0=[SUM($0)], agg#1=[SUM($1)])
> LogicalProject(SAL=[$5], $f1=[$6])
> LogicalFilter(condition=[=($2, $cor0.JOB)])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> })], count=[$1])
> LogicalAggregate(group=[{0}], count=[COUNT()])
> LogicalProject(JOB=[$2])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> There's a problem here: the top-level PROJECT doesn't carry a variablesSet,
> and I'm not sure if it should.
> However after applying the subquery removal rule an incorrect plan is
> produced—it should have generated a LogicalCorrelate but instead generated a
> LogicalJoin.
> {code:java}
> LogicalProject(JOB=[$0], PROFITMARGIN=[$2], count=[$1])
> LogicalJoin(condition=[true], joinType=[left]) <--- here
> LogicalAggregate(group=[{0}], count=[COUNT()])
> LogicalProject(JOB=[$2])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> LogicalProject(EXPR$0=[/(CAST(-($0, $1)):DECIMAL(10, 2), $0)])
> LogicalAggregate(group=[{}], agg#0=[SUM($0)], agg#1=[SUM($1)])
> LogicalProject(SAL=[$5], $f1=[$6])
> LogicalFilter(condition=[=($2, $cor0.JOB)]) <--- corr var
> LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> I suspect the issue lies in the PROJECT_SUB_QUERY_TO_MARK_CORRELATE rule.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)