weihua zhang created CALCITE-7394:
-------------------------------------
Summary: Nested scalar sub-query with multiple levels of
correlation returns incorrect results
Key: CALCITE-7394
URL: https://issues.apache.org/jira/browse/CALCITE-7394
Project: Calcite
Issue Type: Bug
Environment:
{code:sql}
select d.dname,
(select count(*)
from emp e
where e.deptno = d.deptno
and exists (
select 1
from (values (1000), (2000), (3000)) as v(sal)
where e.sal > v.sal
and d.deptno * 100 < v.sal
)
) as c
from dept d
order by d.dname;
{code}
right result:
{code:java}
dname | c
------------+---
ACCOUNTING | 2
OPERATIONS | 0
RESEARCH | 0
SALES | 0
(4 rows)
{code}
but return:
{code:java}
+------------+---+
| dname | c |
+------------+---+
| ACCOUNTING | 2 |
| ACCOUNTING | 2 |
| ACCOUNTING | 2 |
| ACCOUNTING | 2 |
| ACCOUNTING | 2 |
| ACCOUNTING | 2 |
| ACCOUNTING | 2 |
| ACCOUNTING | 2 |
| ACCOUNTING | 2 |
| ACCOUNTING | 2 |
| ACCOUNTING | 2 |
| ACCOUNTING | 2 |
| OPERATIONS | 0 |
| OPERATIONS | 0 |
| OPERATIONS | 0 |
| OPERATIONS | 0 |
| OPERATIONS | 0 |
| OPERATIONS | 0 |
| OPERATIONS | 0 |
| OPERATIONS | 0 |
| OPERATIONS | 0 |
| OPERATIONS | 0 |
| OPERATIONS | 0 |
| OPERATIONS | 0 |
| RESEARCH | 0 |
| RESEARCH | 0 |
| RESEARCH | 0 |
| RESEARCH | 0 |
| RESEARCH | 0 |
| RESEARCH | 0 |
| RESEARCH | 0 |
| RESEARCH | 0 |
| RESEARCH | 0 |
| RESEARCH | 0 |
| RESEARCH | 0 |
| RESEARCH | 0 |
| SALES | 0 |
| SALES | 0 |
| SALES | 0 |
| SALES | 0 |
| SALES | 0 |
| SALES | 0 |
| SALES | 0 |
| SALES | 0 |
| SALES | 0 |
| SALES | 0 |
| SALES | 0 |
| SALES | 0 |
+------------+---+
{code}
Root Cause Analysis:
The issue appears to be in RelDecorrelator. When handling nested correlated
sub-queries, it incorrectly includes bound variables (variables defined within
the inner scope) into the Value Generator.
Specifically, when RelDecorrelator generates a Value Generator for a subtree
(e.g., in rewriteScalarAggregate, decorrelateRel(SetOp), or
decorrelateRel(Join)), it scans the subtree to collect all referenced
correlation variables. If the subtree contains nested Correlate nodes (nested
sub-queries), the correlation variables defined by these internal nodes are
also collected.
This causes the generated Value Generator to include not only the external
correlation variables (Free Variables) but also the internal correlation
variables. This introduces extra dimensions in the Value Generator, leading to
a Cartesian product and result expansion.
Suggested Fix:
When collecting correlation variables for a subtree, we should check the
definition scope of each variable. If a variable is defined within the current
subtree (i.e., it exists in localCorelMap.mapCorToCorRel), it should be
excluded from the Value Generator's correlation list. Only true external
variables should be included.
Reporter: weihua zhang
--
This message was sent by Atlassian Jira
(v8.20.10#820010)