[
https://issues.apache.org/jira/browse/CALCITE-5789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ruben Q L updated CALCITE-5789:
-------------------------------
Description:
Problem can be reproduced with the following query (to be added e.g. in
sub-query.iq):
{code:sql}
select deptno from dept d1 where exists (
select 1 from dept d2 where d2.deptno = d1.deptno and exists (
select 1 from dept d3 where d3.deptno = d2.deptno and d3.dname = d1.dname));
{code}
The problem appears with at least two nested subqueries, when the inner most
references the outermost table, in our case {{and d3.dname = d1.dname}} (if we
remove this expression, the problem does not occur).
When the above query is processed, the following plan is generated (notice how
the top-most projection contains two fields instead of one):
{noformat}
// Plan2 (Decorrelator output)
LogicalProject(DEPTNO=[$0], DEPTNO0=[$4])
LogicalJoin(condition=[AND(=($0, $2), =($1, $3))], joinType=[inner])
LogicalProject(DEPTNO=[$0], DNAME=[$1])
LogicalTableScan(table=[[scott, DEPT]])
LogicalProject(DEPTNO3=[$0], DNAME0=[$3], DEPTNO0=[$4], $f3=[true])
LogicalJoin(condition=[true], joinType=[inner])
LogicalTableScan(table=[[scott, DEPT]])
LogicalProject(DNAME=[$1], DEPTNO=[$0], $f2=[true])
LogicalFilter(condition=[IS NOT NULL($1)])
LogicalTableScan(table=[[scott, DEPT]])
{noformat}
Even if this minimalist test does not fail, it leads to a situation where the
{{RelRoot#validatedRowType}} and the {{RelRoot#rel#rowType}} don't match (see
attached screenshot) , which can lead to unforeseeable consequences (for the
record, I have seen more complex queries which do fail at execution time
because of this issue).
The culprit that generates the unexpected extra field in the final projection
is RelDecorrelator, however I think the decorrelator is not to blame here,
because the input that reaches it is already wrong:
{noformat}
// Plan1 (SubQueryRemoveRule output, Decorrelator input)
LogicalProject(DEPTNO=[$0])
LogicalProject(DEPTNO=[$0], DNAME=[$1])
LogicalCorrelate(correlation=[$cor0], joinType=[inner],
requiredColumns=[{0, 1}])
LogicalProject(DEPTNO=[$0], DNAME=[$1])
LogicalTableScan(table=[[scott, DEPT]])
LogicalAggregate(group=[{0}])
LogicalProject(i=[true])
LogicalProject(DEPTNO=[$0], DNAME=[$1], LOC=[$2])
LogicalFilter(condition=[=($0, $cor0.DEPTNO)])
*** LogicalJoin(condition=[true], joinType=[inner],
variablesSet=[[$cor1, $cor0]])
LogicalTableScan(table=[[scott, DEPT]])
LogicalAggregate(group=[{0}])
LogicalProject(i=[true])
*** LogicalFilter(condition=[AND(=($0, $cor1.DEPTNO), =($1,
$cor0.DNAME))])
LogicalTableScan(table=[[scott, DEPT]])
{noformat}
Notice how there is a correlated variable $cor1 used by the LogicalJoin, which
is not generated by any LogicalCorrelation.
Who generates this wrong plan? SubQueryRemoveRule, which takes the original
plan and removes the nested subqueries with the questionable above result:
{noformat}
// Plan0 (Original, SubQueryRemoveRule input)
LogicalProject(DEPTNO=[$0])
LogicalFilter(condition=[EXISTS({
LogicalFilter(condition=[AND(=($0, $cor0.DEPTNO), EXISTS({
LogicalFilter(condition=[AND(=($0, $cor1.DEPTNO), =($1, $cor0.DNAME))])
LogicalTableScan(table=[[scott, DEPT]])
}))], variablesSet=[[$cor1]])
LogicalTableScan(table=[[scott, DEPT]])
})], variablesSet=[[$cor0]])
LogicalTableScan(table=[[scott, DEPT]])
{noformat}
was:
Problem can be reproduced with the following query (to be added e.g. in
sub-query.iq):
{code:sql}
select deptno from dept d1 where exists (
select 1 from dept d2 where d2.deptno = d1.deptno and exists (
select 1 from dept d3 where d3.deptno = d2.deptno and d3.dname = d1.dname));
{code}
The problem appears with at least two nested subqueries, when the inner most
references the outermost table, in our case {{and d3.dname = d1.dname}} (if we
remove this expression, the problem does not occur).
When the above query is processed, the following plan is generated (notice how
the top-most projection contains two fields instead of one):
{noformat}
// Plan2 (Decorrelator output)
LogicalProject(DEPTNO=[$0], DEPTNO0=[$4])
LogicalJoin(condition=[AND(=($0, $2), =($1, $3))], joinType=[inner])
LogicalProject(DEPTNO=[$0], DNAME=[$1])
LogicalTableScan(table=[[scott, DEPT]])
LogicalProject(DEPTNO3=[$0], DNAME0=[$3], DEPTNO0=[$4], $f3=[true])
LogicalJoin(condition=[true], joinType=[inner])
LogicalTableScan(table=[[scott, DEPT]])
LogicalProject(DNAME=[$1], DEPTNO=[$0], $f2=[true])
LogicalFilter(condition=[IS NOT NULL($1)])
LogicalTableScan(table=[[scott, DEPT]])
{noformat}
Even if this minimalist test does not fail, it leads to a situation where the
{{RelRoot#validatedRowType}} and the {{RelRoot#rel#rowType}} don't match
!debugger.png! , which can lead to unforeseeable consequences (for the record,
I have seen more complex queries which do fail at execution time because of
this issue).
The culprit that generates the unexpected extra field in the final projection
is RelDecorrelator, however I think the decorrelator is not to blame here,
because the input that reaches it is already wrong:
{noformat}
Plan1 (SubQueryRemoveRule output, Decorrelator input)
LogicalProject(DEPTNO=[$0])
LogicalProject(DEPTNO=[$0], DNAME=[$1])
LogicalCorrelate(correlation=[$cor0], joinType=[inner],
requiredColumns=[{0, 1}])
LogicalProject(DEPTNO=[$0], DNAME=[$1])
LogicalTableScan(table=[[scott, DEPT]])
LogicalAggregate(group=[{0}])
LogicalProject(i=[true])
LogicalProject(DEPTNO=[$0], DNAME=[$1], LOC=[$2])
LogicalFilter(condition=[=($0, $cor0.DEPTNO)])
*** LogicalJoin(condition=[true], joinType=[inner],
variablesSet=[[$cor1, $cor0]])
LogicalTableScan(table=[[scott, DEPT]])
LogicalAggregate(group=[{0}])
LogicalProject(i=[true])
*** LogicalFilter(condition=[AND(=($0, $cor1.DEPTNO), =($1,
$cor0.DNAME))])
LogicalTableScan(table=[[scott, DEPT]])
{noformat}
Notice how there is a correlated variable $cor1 used by the LogicalJoin, which
is not generated by any LogicalCorrelation.
Who generates this wrong plan? SubQueryRemoveRule, which takes the original
plan and removes the nested subqueries with the questionable above result:
{noformat}
Plan0 (Original, SubQueryRemoveRule input)
LogicalProject(DEPTNO=[$0])
LogicalFilter(condition=[EXISTS({
LogicalFilter(condition=[AND(=($0, $cor0.DEPTNO), EXISTS({
LogicalFilter(condition=[AND(=($0, $cor1.DEPTNO), =($1, $cor0.DNAME))])
LogicalTableScan(table=[[scott, DEPT]])
}))], variablesSet=[[$cor1]])
LogicalTableScan(table=[[scott, DEPT]])
})], variablesSet=[[$cor0]])
LogicalTableScan(table=[[scott, DEPT]])
{noformat}
> Query with two nested subqueries where the inner-most references the
> outer-most table returns wrong result
> ----------------------------------------------------------------------------------------------------------
>
> Key: CALCITE-5789
> URL: https://issues.apache.org/jira/browse/CALCITE-5789
> Project: Calcite
> Issue Type: Bug
> Components: core
> Reporter: Ruben Q L
> Priority: Major
> Attachments: debugger.png
>
>
> Problem can be reproduced with the following query (to be added e.g. in
> sub-query.iq):
> {code:sql}
> select deptno from dept d1 where exists (
> select 1 from dept d2 where d2.deptno = d1.deptno and exists (
> select 1 from dept d3 where d3.deptno = d2.deptno and d3.dname = d1.dname));
> {code}
> The problem appears with at least two nested subqueries, when the inner most
> references the outermost table, in our case {{and d3.dname = d1.dname}} (if
> we remove this expression, the problem does not occur).
> When the above query is processed, the following plan is generated (notice
> how the top-most projection contains two fields instead of one):
> {noformat}
> // Plan2 (Decorrelator output)
> LogicalProject(DEPTNO=[$0], DEPTNO0=[$4])
> LogicalJoin(condition=[AND(=($0, $2), =($1, $3))], joinType=[inner])
> LogicalProject(DEPTNO=[$0], DNAME=[$1])
> LogicalTableScan(table=[[scott, DEPT]])
> LogicalProject(DEPTNO3=[$0], DNAME0=[$3], DEPTNO0=[$4], $f3=[true])
> LogicalJoin(condition=[true], joinType=[inner])
> LogicalTableScan(table=[[scott, DEPT]])
> LogicalProject(DNAME=[$1], DEPTNO=[$0], $f2=[true])
> LogicalFilter(condition=[IS NOT NULL($1)])
> LogicalTableScan(table=[[scott, DEPT]])
> {noformat}
> Even if this minimalist test does not fail, it leads to a situation where the
> {{RelRoot#validatedRowType}} and the {{RelRoot#rel#rowType}} don't match
> (see attached screenshot) , which can lead to unforeseeable consequences (for
> the record, I have seen more complex queries which do fail at execution time
> because of this issue).
> The culprit that generates the unexpected extra field in the final projection
> is RelDecorrelator, however I think the decorrelator is not to blame here,
> because the input that reaches it is already wrong:
> {noformat}
> // Plan1 (SubQueryRemoveRule output, Decorrelator input)
> LogicalProject(DEPTNO=[$0])
> LogicalProject(DEPTNO=[$0], DNAME=[$1])
> LogicalCorrelate(correlation=[$cor0], joinType=[inner],
> requiredColumns=[{0, 1}])
> LogicalProject(DEPTNO=[$0], DNAME=[$1])
> LogicalTableScan(table=[[scott, DEPT]])
> LogicalAggregate(group=[{0}])
> LogicalProject(i=[true])
> LogicalProject(DEPTNO=[$0], DNAME=[$1], LOC=[$2])
> LogicalFilter(condition=[=($0, $cor0.DEPTNO)])
> *** LogicalJoin(condition=[true], joinType=[inner],
> variablesSet=[[$cor1, $cor0]])
> LogicalTableScan(table=[[scott, DEPT]])
> LogicalAggregate(group=[{0}])
> LogicalProject(i=[true])
> *** LogicalFilter(condition=[AND(=($0, $cor1.DEPTNO), =($1,
> $cor0.DNAME))])
> LogicalTableScan(table=[[scott, DEPT]])
> {noformat}
> Notice how there is a correlated variable $cor1 used by the LogicalJoin,
> which is not generated by any LogicalCorrelation.
> Who generates this wrong plan? SubQueryRemoveRule, which takes the original
> plan and removes the nested subqueries with the questionable above result:
> {noformat}
> // Plan0 (Original, SubQueryRemoveRule input)
> LogicalProject(DEPTNO=[$0])
> LogicalFilter(condition=[EXISTS({
> LogicalFilter(condition=[AND(=($0, $cor0.DEPTNO), EXISTS({
> LogicalFilter(condition=[AND(=($0, $cor1.DEPTNO), =($1, $cor0.DNAME))])
> LogicalTableScan(table=[[scott, DEPT]])
> }))], variablesSet=[[$cor1]])
> LogicalTableScan(table=[[scott, DEPT]])
> })], variablesSet=[[$cor0]])
> LogicalTableScan(table=[[scott, DEPT]])
> {noformat}
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)