[ 
https://issues.apache.org/jira/browse/CALCITE-5789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruben Q L updated CALCITE-5789:
-------------------------------
    Description: 
Problem can be reproduced with the following query (to be added e.g. in 
sub-query.iq):
{code:sql}
select deptno from dept d1 where exists (
 select 1 from dept d2 where d2.deptno = d1.deptno and exists (
  select 1 from dept d3 where d3.deptno = d2.deptno and d3.dname = d1.dname));
{code}

The problem appears with at least two nested subqueries, when the inner most 
references the outermost table, in our case {{and d3.dname = d1.dname}} (if we 
remove this expression, the problem does not occur).

When the above query is processed, the following plan is generated (notice how 
the top-most projection contains two fields instead of one):
{noformat}
// Plan2 (Decorrelator output)
LogicalProject(DEPTNO=[$0], DEPTNO0=[$4])
  LogicalJoin(condition=[AND(=($0, $2), =($1, $3))], joinType=[inner])
    LogicalProject(DEPTNO=[$0], DNAME=[$1])
      LogicalTableScan(table=[[scott, DEPT]])
    LogicalProject(DEPTNO3=[$0], DNAME0=[$3], DEPTNO0=[$4], $f3=[true])
      LogicalJoin(condition=[true], joinType=[inner])
        LogicalTableScan(table=[[scott, DEPT]])
        LogicalProject(DNAME=[$1], DEPTNO=[$0], $f2=[true])
          LogicalFilter(condition=[IS NOT NULL($1)])
            LogicalTableScan(table=[[scott, DEPT]])
{noformat}

Even if this minimalist test does not fail, it leads to a situation where the 
{{RelRoot#validatedRowType}} and the {{RelRoot#rel#rowType}} don't match  (see 
attached screenshot) , which can lead to unforeseeable consequences (for the 
record, I have seen more complex queries which do fail at execution time 
because of this issue).

The culprit that generates the unexpected extra field in the final projection 
is RelDecorrelator, however I think the decorrelator is not to blame here, 
because the input that reaches it is already wrong:
{noformat}
// Plan1 (SubQueryRemoveRule output, Decorrelator input)
LogicalProject(DEPTNO=[$0])
  LogicalProject(DEPTNO=[$0], DNAME=[$1])
    LogicalCorrelate(correlation=[$cor0], joinType=[inner], 
requiredColumns=[{0, 1}])
      LogicalProject(DEPTNO=[$0], DNAME=[$1])
        LogicalTableScan(table=[[scott, DEPT]])
      LogicalAggregate(group=[{0}])
        LogicalProject(i=[true])
          LogicalProject(DEPTNO=[$0], DNAME=[$1], LOC=[$2])
            LogicalFilter(condition=[=($0, $cor0.DEPTNO)])
***           LogicalJoin(condition=[true], joinType=[inner], 
variablesSet=[[$cor1, $cor0]])
                LogicalTableScan(table=[[scott, DEPT]])
                LogicalAggregate(group=[{0}])
                  LogicalProject(i=[true])
***                 LogicalFilter(condition=[AND(=($0, $cor1.DEPTNO), =($1, 
$cor0.DNAME))])
                      LogicalTableScan(table=[[scott, DEPT]])
{noformat}

Notice how there is a correlated variable $cor1 used by the LogicalJoin, which 
is not generated by any LogicalCorrelate.
Who generates this wrong plan? SubQueryRemoveRule, which takes the original 
plan and removes the nested subqueries with the questionable above result:
{noformat}
// Plan0 (Original, SubQueryRemoveRule input)
LogicalProject(DEPTNO=[$0])
  LogicalFilter(condition=[EXISTS({
    LogicalFilter(condition=[AND(=($0, $cor0.DEPTNO), EXISTS({
      LogicalFilter(condition=[AND(=($0, $cor1.DEPTNO), =($1, $cor0.DNAME))])
        LogicalTableScan(table=[[scott, DEPT]])
    }))], variablesSet=[[$cor1]])
      LogicalTableScan(table=[[scott, DEPT]])
  })], variablesSet=[[$cor0]])
    LogicalTableScan(table=[[scott, DEPT]])
{noformat}


 


  was:
Problem can be reproduced with the following query (to be added e.g. in 
sub-query.iq):
{code:sql}
select deptno from dept d1 where exists (
 select 1 from dept d2 where d2.deptno = d1.deptno and exists (
  select 1 from dept d3 where d3.deptno = d2.deptno and d3.dname = d1.dname));
{code}

The problem appears with at least two nested subqueries, when the inner most 
references the outermost table, in our case {{and d3.dname = d1.dname}} (if we 
remove this expression, the problem does not occur).

When the above query is processed, the following plan is generated (notice how 
the top-most projection contains two fields instead of one):
{noformat}
// Plan2 (Decorrelator output)
LogicalProject(DEPTNO=[$0], DEPTNO0=[$4])
  LogicalJoin(condition=[AND(=($0, $2), =($1, $3))], joinType=[inner])
    LogicalProject(DEPTNO=[$0], DNAME=[$1])
      LogicalTableScan(table=[[scott, DEPT]])
    LogicalProject(DEPTNO3=[$0], DNAME0=[$3], DEPTNO0=[$4], $f3=[true])
      LogicalJoin(condition=[true], joinType=[inner])
        LogicalTableScan(table=[[scott, DEPT]])
        LogicalProject(DNAME=[$1], DEPTNO=[$0], $f2=[true])
          LogicalFilter(condition=[IS NOT NULL($1)])
            LogicalTableScan(table=[[scott, DEPT]])
{noformat}

Even if this minimalist test does not fail, it leads to a situation where the 
{{RelRoot#validatedRowType}} and the {{RelRoot#rel#rowType}} don't match  (see 
attached screenshot) , which can lead to unforeseeable consequences (for the 
record, I have seen more complex queries which do fail at execution time 
because of this issue).

The culprit that generates the unexpected extra field in the final projection 
is RelDecorrelator, however I think the decorrelator is not to blame here, 
because the input that reaches it is already wrong:
{noformat}
// Plan1 (SubQueryRemoveRule output, Decorrelator input)
LogicalProject(DEPTNO=[$0])
  LogicalProject(DEPTNO=[$0], DNAME=[$1])
    LogicalCorrelate(correlation=[$cor0], joinType=[inner], 
requiredColumns=[{0, 1}])
      LogicalProject(DEPTNO=[$0], DNAME=[$1])
        LogicalTableScan(table=[[scott, DEPT]])
      LogicalAggregate(group=[{0}])
        LogicalProject(i=[true])
          LogicalProject(DEPTNO=[$0], DNAME=[$1], LOC=[$2])
            LogicalFilter(condition=[=($0, $cor0.DEPTNO)])
***           LogicalJoin(condition=[true], joinType=[inner], 
variablesSet=[[$cor1, $cor0]])
                LogicalTableScan(table=[[scott, DEPT]])
                LogicalAggregate(group=[{0}])
                  LogicalProject(i=[true])
***                 LogicalFilter(condition=[AND(=($0, $cor1.DEPTNO), =($1, 
$cor0.DNAME))])
                      LogicalTableScan(table=[[scott, DEPT]])
{noformat}

Notice how there is a correlated variable $cor1 used by the LogicalJoin, which 
is not generated by any LogicalCorrelation.
Who generates this wrong plan? SubQueryRemoveRule, which takes the original 
plan and removes the nested subqueries with the questionable above result:
{noformat}
// Plan0 (Original, SubQueryRemoveRule input)
LogicalProject(DEPTNO=[$0])
  LogicalFilter(condition=[EXISTS({
    LogicalFilter(condition=[AND(=($0, $cor0.DEPTNO), EXISTS({
      LogicalFilter(condition=[AND(=($0, $cor1.DEPTNO), =($1, $cor0.DNAME))])
        LogicalTableScan(table=[[scott, DEPT]])
    }))], variablesSet=[[$cor1]])
      LogicalTableScan(table=[[scott, DEPT]])
  })], variablesSet=[[$cor0]])
    LogicalTableScan(table=[[scott, DEPT]])
{noformat}


 



> Query with two nested subqueries where the inner-most references the 
> outer-most table returns wrong result
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: CALCITE-5789
>                 URL: https://issues.apache.org/jira/browse/CALCITE-5789
>             Project: Calcite
>          Issue Type: Bug
>          Components: core
>            Reporter: Ruben Q L
>            Priority: Major
>         Attachments: debugger.png
>
>
> Problem can be reproduced with the following query (to be added e.g. in 
> sub-query.iq):
> {code:sql}
> select deptno from dept d1 where exists (
>  select 1 from dept d2 where d2.deptno = d1.deptno and exists (
>   select 1 from dept d3 where d3.deptno = d2.deptno and d3.dname = d1.dname));
> {code}
> The problem appears with at least two nested subqueries, when the inner most 
> references the outermost table, in our case {{and d3.dname = d1.dname}} (if 
> we remove this expression, the problem does not occur).
> When the above query is processed, the following plan is generated (notice 
> how the top-most projection contains two fields instead of one):
> {noformat}
> // Plan2 (Decorrelator output)
> LogicalProject(DEPTNO=[$0], DEPTNO0=[$4])
>   LogicalJoin(condition=[AND(=($0, $2), =($1, $3))], joinType=[inner])
>     LogicalProject(DEPTNO=[$0], DNAME=[$1])
>       LogicalTableScan(table=[[scott, DEPT]])
>     LogicalProject(DEPTNO3=[$0], DNAME0=[$3], DEPTNO0=[$4], $f3=[true])
>       LogicalJoin(condition=[true], joinType=[inner])
>         LogicalTableScan(table=[[scott, DEPT]])
>         LogicalProject(DNAME=[$1], DEPTNO=[$0], $f2=[true])
>           LogicalFilter(condition=[IS NOT NULL($1)])
>             LogicalTableScan(table=[[scott, DEPT]])
> {noformat}
> Even if this minimalist test does not fail, it leads to a situation where the 
> {{RelRoot#validatedRowType}} and the {{RelRoot#rel#rowType}} don't match  
> (see attached screenshot) , which can lead to unforeseeable consequences (for 
> the record, I have seen more complex queries which do fail at execution time 
> because of this issue).
> The culprit that generates the unexpected extra field in the final projection 
> is RelDecorrelator, however I think the decorrelator is not to blame here, 
> because the input that reaches it is already wrong:
> {noformat}
> // Plan1 (SubQueryRemoveRule output, Decorrelator input)
> LogicalProject(DEPTNO=[$0])
>   LogicalProject(DEPTNO=[$0], DNAME=[$1])
>     LogicalCorrelate(correlation=[$cor0], joinType=[inner], 
> requiredColumns=[{0, 1}])
>       LogicalProject(DEPTNO=[$0], DNAME=[$1])
>         LogicalTableScan(table=[[scott, DEPT]])
>       LogicalAggregate(group=[{0}])
>         LogicalProject(i=[true])
>           LogicalProject(DEPTNO=[$0], DNAME=[$1], LOC=[$2])
>             LogicalFilter(condition=[=($0, $cor0.DEPTNO)])
> ***           LogicalJoin(condition=[true], joinType=[inner], 
> variablesSet=[[$cor1, $cor0]])
>                 LogicalTableScan(table=[[scott, DEPT]])
>                 LogicalAggregate(group=[{0}])
>                   LogicalProject(i=[true])
> ***                 LogicalFilter(condition=[AND(=($0, $cor1.DEPTNO), =($1, 
> $cor0.DNAME))])
>                       LogicalTableScan(table=[[scott, DEPT]])
> {noformat}
> Notice how there is a correlated variable $cor1 used by the LogicalJoin, 
> which is not generated by any LogicalCorrelate.
> Who generates this wrong plan? SubQueryRemoveRule, which takes the original 
> plan and removes the nested subqueries with the questionable above result:
> {noformat}
> // Plan0 (Original, SubQueryRemoveRule input)
> LogicalProject(DEPTNO=[$0])
>   LogicalFilter(condition=[EXISTS({
>     LogicalFilter(condition=[AND(=($0, $cor0.DEPTNO), EXISTS({
>       LogicalFilter(condition=[AND(=($0, $cor1.DEPTNO), =($1, $cor0.DNAME))])
>         LogicalTableScan(table=[[scott, DEPT]])
>     }))], variablesSet=[[$cor1]])
>       LogicalTableScan(table=[[scott, DEPT]])
>   })], variablesSet=[[$cor0]])
>     LogicalTableScan(table=[[scott, DEPT]])
> {noformat}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to