[ 
https://issues.apache.org/jira/browse/CALCITE-7394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated CALCITE-7394:
------------------------------------
    Labels: pull-request-available  (was: )

> Nested sub-query with multiple levels of correlation returns incorrect results
> ------------------------------------------------------------------------------
>
>                 Key: CALCITE-7394
>                 URL: https://issues.apache.org/jira/browse/CALCITE-7394
>             Project: Calcite
>          Issue Type: Bug
>    Affects Versions: 1.42.0
>            Reporter: weihua zhang
>            Priority: Major
>              Labels: pull-request-available
>
> {code:sql}
> select d.dname,
>   (select count(*)
>    from emp e
>    where e.deptno = d.deptno
>    and exists (
>      select 1
>      from (values (1000), (2000), (3000)) as v(sal)
>      where e.sal > v.sal
>      and d.deptno * 100 < v.sal
>    )
>   ) as c
> from dept d
> order by d.dname;
> {code}
> right result:
> {code:java}
> dname    | c 
> ------------+---
>  ACCOUNTING | 2
>  OPERATIONS | 0
>  RESEARCH   | 0
>  SALES      | 0
> (4 rows)
> {code}
> but return:
> {code:java}
> +------------+---+
> | dname      | c |
> +------------+---+
> | ACCOUNTING | 2 |
> | ACCOUNTING | 2 |
> | ACCOUNTING | 2 |
> | ACCOUNTING | 2 |
> | ACCOUNTING | 2 |
> | ACCOUNTING | 2 |
> | ACCOUNTING | 2 |
> | ACCOUNTING | 2 |
> | ACCOUNTING | 2 |
> | ACCOUNTING | 2 |
> | ACCOUNTING | 2 |
> | ACCOUNTING | 2 |
> | OPERATIONS | 0 |
> | OPERATIONS | 0 |
> | OPERATIONS | 0 |
> | OPERATIONS | 0 |
> | OPERATIONS | 0 |
> | OPERATIONS | 0 |
> | OPERATIONS | 0 |
> | OPERATIONS | 0 |
> | OPERATIONS | 0 |
> | OPERATIONS | 0 |
> | OPERATIONS | 0 |
> | OPERATIONS | 0 |
> | RESEARCH   | 0 |
> | RESEARCH   | 0 |
> | RESEARCH   | 0 |
> | RESEARCH   | 0 |
> | RESEARCH   | 0 |
> | RESEARCH   | 0 |
> | RESEARCH   | 0 |
> | RESEARCH   | 0 |
> | RESEARCH   | 0 |
> | RESEARCH   | 0 |
> | RESEARCH   | 0 |
> | RESEARCH   | 0 |
> | SALES      | 0 |
> | SALES      | 0 |
> | SALES      | 0 |
> | SALES      | 0 |
> | SALES      | 0 |
> | SALES      | 0 |
> | SALES      | 0 |
> | SALES      | 0 |
> | SALES      | 0 |
> | SALES      | 0 |
> | SALES      | 0 |
> | SALES      | 0 |
> +------------+---+
> {code}
> Root Cause Analysis:
> The issue appears to be in RelDecorrelator. When handling nested correlated 
> sub-queries, it incorrectly includes bound variables (variables defined 
> within the inner scope) into the Value Generator.
> Specifically, when RelDecorrelator generates a Value Generator for a subtree 
> (e.g., in rewriteScalarAggregate, decorrelateRel(SetOp), or 
> decorrelateRel(Join)), it scans the subtree to collect all referenced 
> correlation variables. If the subtree contains nested Correlate nodes (nested 
> sub-queries), the correlation variables defined by these internal nodes are 
> also collected.
> This causes the generated Value Generator to include not only the external 
> correlation variables (Free Variables) but also the internal correlation 
> variables. This introduces extra dimensions in the Value Generator, leading 
> to a Cartesian product and result expansion.
> Suggested Fix:
> When collecting correlation variables for a subtree, we should check the 
> definition scope of each variable. If a variable is defined within the 
> current subtree (i.e., it exists in localCorelMap.mapCorToCorRel), it should 
> be excluded from the Value Generator's correlation list. Only true external 
> variables should be included.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to