weihua zhang created CALCITE-7394:
-------------------------------------

             Summary: Nested scalar sub-query with multiple levels of 
correlation returns incorrect results
                 Key: CALCITE-7394
                 URL: https://issues.apache.org/jira/browse/CALCITE-7394
             Project: Calcite
          Issue Type: Bug
         Environment: 
{code:sql}
select d.dname,
  (select count(*)
   from emp e
   where e.deptno = d.deptno
   and exists (
     select 1
     from (values (1000), (2000), (3000)) as v(sal)
     where e.sal > v.sal
     and d.deptno * 100 < v.sal
   )
  ) as c
from dept d
order by d.dname;
{code}

right result:

{code:java}
dname    | c 
------------+---
 ACCOUNTING | 2
 OPERATIONS | 0
 RESEARCH   | 0
 SALES      | 0
(4 rows)
{code}

but return:

{code:java}
+------------+---+
| dname      | c |
+------------+---+
| ACCOUNTING | 2 |
| ACCOUNTING | 2 |
| ACCOUNTING | 2 |
| ACCOUNTING | 2 |
| ACCOUNTING | 2 |
| ACCOUNTING | 2 |
| ACCOUNTING | 2 |
| ACCOUNTING | 2 |
| ACCOUNTING | 2 |
| ACCOUNTING | 2 |
| ACCOUNTING | 2 |
| ACCOUNTING | 2 |
| OPERATIONS | 0 |
| OPERATIONS | 0 |
| OPERATIONS | 0 |
| OPERATIONS | 0 |
| OPERATIONS | 0 |
| OPERATIONS | 0 |
| OPERATIONS | 0 |
| OPERATIONS | 0 |
| OPERATIONS | 0 |
| OPERATIONS | 0 |
| OPERATIONS | 0 |
| OPERATIONS | 0 |
| RESEARCH   | 0 |
| RESEARCH   | 0 |
| RESEARCH   | 0 |
| RESEARCH   | 0 |
| RESEARCH   | 0 |
| RESEARCH   | 0 |
| RESEARCH   | 0 |
| RESEARCH   | 0 |
| RESEARCH   | 0 |
| RESEARCH   | 0 |
| RESEARCH   | 0 |
| RESEARCH   | 0 |
| SALES      | 0 |
| SALES      | 0 |
| SALES      | 0 |
| SALES      | 0 |
| SALES      | 0 |
| SALES      | 0 |
| SALES      | 0 |
| SALES      | 0 |
| SALES      | 0 |
| SALES      | 0 |
| SALES      | 0 |
| SALES      | 0 |
+------------+---+
{code}



Root Cause Analysis:

The issue appears to be in RelDecorrelator. When handling nested correlated 
sub-queries, it incorrectly includes bound variables (variables defined within 
the inner scope) into the Value Generator.

Specifically, when RelDecorrelator generates a Value Generator for a subtree 
(e.g., in rewriteScalarAggregate, decorrelateRel(SetOp), or 
decorrelateRel(Join)), it scans the subtree to collect all referenced 
correlation variables. If the subtree contains nested Correlate nodes (nested 
sub-queries), the correlation variables defined by these internal nodes are 
also collected.

This causes the generated Value Generator to include not only the external 
correlation variables (Free Variables) but also the internal correlation 
variables. This introduces extra dimensions in the Value Generator, leading to 
a Cartesian product and result expansion.

Suggested Fix:

When collecting correlation variables for a subtree, we should check the 
definition scope of each variable. If a variable is defined within the current 
subtree (i.e., it exists in localCorelMap.mapCorToCorRel), it should be 
excluded from the Value Generator's correlation list. Only true external 
variables should be included.
            Reporter: weihua zhang






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to