[
https://issues.apache.org/jira/browse/CALCITE-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18057002#comment-18057002
]
Silun Dong commented on CALCITE-7411:
-------------------------------------
Thank you for the detailed explanation.
I think the new algorithm should not skip this logic. This logic is designed to
check if the correlation in the subquery belongs to the current scope. If it
does, the subquery should be rewritten as a {{{}Correlate{}}}; otherwise, it
should be rewritten as a {{Join}} ({{{}Correlate{}}} appears at a higher
level). Skipping this logic would break the correctness of the new algorithm in
scenarios with nested correlated subqueries.
I would prefer that the {{variablesSet}} of top-level {{Project}} correctly
include the {{{}CorrelationId{}}}.
> When a SCALAR_QUERY in PROJECT contains correlated variables execution fails
> using TopDownGeneralDecorrelator
> -------------------------------------------------------------------------------------------------------------
>
> Key: CALCITE-7411
> URL: https://issues.apache.org/jira/browse/CALCITE-7411
> Project: Calcite
> Issue Type: Bug
> Components: core
> Affects Versions: 1.41.0
> Reporter: Zhen Chen
> Priority: Major
> Labels: pull-request-available
> Fix For: 1.42.0
>
>
> Decorrelation fails when executing the following case in RelOptRulesTest
> {code:java}
> @Test void testTopDownGeneralDecorrelateForMeasure() {
> final String sql = "SELECT job,\n"
> + " (SELECT\n"
> + " CAST(SUM(i.sal) - SUM(COALESCE(i.comm, 0)) AS DECIMAL(10,
> 2)) / SUM(i.sal)\n"
> + " FROM emp AS i\n"
> + " WHERE i.job = e.job) AS profitMargin,\n"
> + " COUNT(*) AS \"count\"\n"
> + "FROM emp AS e\n"
> + "GROUP BY job"; sql(sql)
> .withRule(
> CoreRules.PROJECT_SUB_QUERY_TO_MARK_CORRELATE,
> CoreRules.PROJECT_MERGE,
> CoreRules.PROJECT_REMOVE)
> .withLateDecorrelate(true)
> .withTopDownGeneralDecorrelate(true)
> .check();
> } {code}
> SQL to Rel:
> {code:java}
> LogicalProject(JOB=[$0], PROFITMARGIN=[$SCALAR_QUERY({
> LogicalProject(EXPR$0=[/(CAST(-($0, $1)):DECIMAL(10, 2), $0)])
> LogicalAggregate(group=[{}], agg#0=[SUM($0)], agg#1=[SUM($1)])
> LogicalProject(SAL=[$5], $f1=[$6])
> LogicalFilter(condition=[=($2, $cor0.JOB)])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> })], count=[$1])
> LogicalAggregate(group=[{0}], count=[COUNT()])
> LogicalProject(JOB=[$2])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> There's a problem here: the top-level PROJECT doesn't carry a variablesSet,
> and I'm not sure if it should.
> However after applying the subquery removal rule an incorrect plan is
> produced—it should have generated a LogicalCorrelate but instead generated a
> LogicalJoin.
> {code:java}
> LogicalProject(JOB=[$0], PROFITMARGIN=[$2], count=[$1])
> LogicalJoin(condition=[true], joinType=[left]) <--- here
> LogicalAggregate(group=[{0}], count=[COUNT()])
> LogicalProject(JOB=[$2])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> LogicalProject(EXPR$0=[/(CAST(-($0, $1)):DECIMAL(10, 2), $0)])
> LogicalAggregate(group=[{}], agg#0=[SUM($0)], agg#1=[SUM($1)])
> LogicalProject(SAL=[$5], $f1=[$6])
> LogicalFilter(condition=[=($2, $cor0.JOB)]) <--- corr var
> LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> I suspect the issue lies in the PROJECT_SUB_QUERY_TO_MARK_CORRELATE rule.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)