[
https://issues.apache.org/jira/browse/CALCITE-7394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated CALCITE-7394:
------------------------------------
Labels: pull-request-available (was: )
> Nested sub-query with multiple levels of correlation returns incorrect results
> ------------------------------------------------------------------------------
>
> Key: CALCITE-7394
> URL: https://issues.apache.org/jira/browse/CALCITE-7394
> Project: Calcite
> Issue Type: Bug
> Affects Versions: 1.42.0
> Reporter: weihua zhang
> Priority: Major
> Labels: pull-request-available
>
> {code:sql}
> select d.dname,
> (select count(*)
> from emp e
> where e.deptno = d.deptno
> and exists (
> select 1
> from (values (1000), (2000), (3000)) as v(sal)
> where e.sal > v.sal
> and d.deptno * 100 < v.sal
> )
> ) as c
> from dept d
> order by d.dname;
> {code}
> right result:
> {code:java}
> dname | c
> ------------+---
> ACCOUNTING | 2
> OPERATIONS | 0
> RESEARCH | 0
> SALES | 0
> (4 rows)
> {code}
> but return:
> {code:java}
> +------------+---+
> | dname | c |
> +------------+---+
> | ACCOUNTING | 2 |
> | ACCOUNTING | 2 |
> | ACCOUNTING | 2 |
> | ACCOUNTING | 2 |
> | ACCOUNTING | 2 |
> | ACCOUNTING | 2 |
> | ACCOUNTING | 2 |
> | ACCOUNTING | 2 |
> | ACCOUNTING | 2 |
> | ACCOUNTING | 2 |
> | ACCOUNTING | 2 |
> | ACCOUNTING | 2 |
> | OPERATIONS | 0 |
> | OPERATIONS | 0 |
> | OPERATIONS | 0 |
> | OPERATIONS | 0 |
> | OPERATIONS | 0 |
> | OPERATIONS | 0 |
> | OPERATIONS | 0 |
> | OPERATIONS | 0 |
> | OPERATIONS | 0 |
> | OPERATIONS | 0 |
> | OPERATIONS | 0 |
> | OPERATIONS | 0 |
> | RESEARCH | 0 |
> | RESEARCH | 0 |
> | RESEARCH | 0 |
> | RESEARCH | 0 |
> | RESEARCH | 0 |
> | RESEARCH | 0 |
> | RESEARCH | 0 |
> | RESEARCH | 0 |
> | RESEARCH | 0 |
> | RESEARCH | 0 |
> | RESEARCH | 0 |
> | RESEARCH | 0 |
> | SALES | 0 |
> | SALES | 0 |
> | SALES | 0 |
> | SALES | 0 |
> | SALES | 0 |
> | SALES | 0 |
> | SALES | 0 |
> | SALES | 0 |
> | SALES | 0 |
> | SALES | 0 |
> | SALES | 0 |
> | SALES | 0 |
> +------------+---+
> {code}
> Root Cause Analysis:
> The issue appears to be in RelDecorrelator. When handling nested correlated
> sub-queries, it incorrectly includes bound variables (variables defined
> within the inner scope) into the Value Generator.
> Specifically, when RelDecorrelator generates a Value Generator for a subtree
> (e.g., in rewriteScalarAggregate, decorrelateRel(SetOp), or
> decorrelateRel(Join)), it scans the subtree to collect all referenced
> correlation variables. If the subtree contains nested Correlate nodes (nested
> sub-queries), the correlation variables defined by these internal nodes are
> also collected.
> This causes the generated Value Generator to include not only the external
> correlation variables (Free Variables) but also the internal correlation
> variables. This introduces extra dimensions in the Value Generator, leading
> to a Cartesian product and result expansion.
> Suggested Fix:
> When collecting correlation variables for a subtree, we should check the
> definition scope of each variable. If a variable is defined within the
> current subtree (i.e., it exists in localCorelMap.mapCorToCorRel), it should
> be excluded from the Value Generator's correlation list. Only true external
> variables should be included.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)