[ 
https://issues.apache.org/jira/browse/CALCITE-5655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Runkang He updated CALCITE-5655:
--------------------------------
    Description: 
When the query contains multiple IN/SOME sub-queries connected with OR 
predicate in WHERE clause, the result is wrong. The minimal reproducer is below:
SQL:
{code:SQL}
select empno from sales.empnullables
where deptno in (
  select deptno from sales.deptnullables where name = 'dept1')
or deptno in (
  select deptno from sales.deptnullables where name = 'dept2')
{code}
The Plan generated by calcite master branch: (Notice the bold part in the 
downstream LogicalFilter)

{code:SQL}
LogicalProject(EMPNO=[$0])
  LogicalProject(EMPNO=[$0], DEPTNO=[$1])
    LogicalFilter(condition=[OR(AND(<>($2, 0), IS NOT NULL($5), IS NOT 
NULL($1)), AND(*<>($2, 0)*, IS NOT NULL($9), IS NOT NULL($1)))])
      LogicalJoin(condition=[=($1, $8)], joinType=[left])
        LogicalJoin(condition=[true], joinType=[inner])
          LogicalJoin(condition=[=($1, $4)], joinType=[left])
            LogicalJoin(condition=[true], joinType=[inner])
              LogicalProject(EMPNO=[$0], DEPTNO=[$7])
                LogicalTableScan(table=[[CATALOG, SALES, EMPNULLABLES]])
              LogicalAggregate(group=[{}], c=[COUNT()], ck=[COUNT($0)])
                LogicalProject(DEPTNO=[$0])
                  LogicalFilter(condition=[=($1, 'dept1')])
                    LogicalTableScan(table=[[CATALOG, SALES, DEPTNULLABLES]])
            LogicalProject(DEPTNO=[$0], i=[true])
              LogicalFilter(condition=[=($1, 'dept1')])
                LogicalTableScan(table=[[CATALOG, SALES, DEPTNULLABLES]])
          LogicalAggregate(group=[{}], c=[COUNT()], ck=[COUNT($0)])
            LogicalProject(DEPTNO=[$0])
              LogicalFilter(condition=[=($1, 'dept2')])
                LogicalTableScan(table=[[CATALOG, SALES, DEPTNULLABLES]])
        LogicalProject(DEPTNO=[$0], i=[true])
          LogicalFilter(condition=[=($1, 'dept2')])
            LogicalTableScan(table=[[CATALOG, SALES, DEPTNULLABLES]])
{code}

The wrong part is that when build the downstream LogicalFilter for the two 
sub-queries, the filter for the second sub-query is AND(<>($2, 0), IS NOT 
NULL($9), IS NOT NULL($1)), notice that $2 should be the second sub-query's 
intermediate table field ct.c(which field index is $6), but now the actual 
reference is the first sub-query's, this leads to wrong plan, and wrong result.
The root cause is that intermediate table alias is the same as the previous 
sub-query's, but when lookup intermediate table field, it always returns the 
previous one which is not belong to the current subquery. 

  was:
When the query contains multiple IN/SOME sub-queries connected with OR 
predicate in WHERE clause, the result is wrong. The minimal reproducer is below:
SQL:
{{select empno from sales.empnullables
where deptno in (
  select deptno from sales.deptnullables where name = 'dept1')
or deptno in (
  select deptno from sales.deptnullables where name = 'dept2')}}
The Plan generated by calcite master branch: (Notice the bold part in the 
downstream LogicalFilter)
LogicalProject(EMPNO=[$0])
  LogicalProject(EMPNO=[$0], DEPTNO=[$1])
    LogicalFilter(condition=[OR(AND(<>($2, 0), IS NOT NULL($5), IS NOT 
NULL($1)), AND(<>($2, 0), IS NOT NULL($9), IS NOT NULL($1)))])
      LogicalJoin(condition=[=($1, $8)], joinType=[left])
        LogicalJoin(condition=[true], joinType=[inner])
          LogicalJoin(condition=[=($1, $4)], joinType=[left])
            LogicalJoin(condition=[true], joinType=[inner])
              LogicalProject(EMPNO=[$0], DEPTNO=[$7])
                LogicalTableScan(table=[[CATALOG, SALES, EMPNULLABLES]])
              LogicalAggregate(group=[{}], c=[COUNT()], ck=[COUNT($0)])
                LogicalProject(DEPTNO=[$0])
                  LogicalFilter(condition=[=($1, 'dept1')])
                    LogicalTableScan(table=[[CATALOG, SALES, DEPTNULLABLES]])
            LogicalProject(DEPTNO=[$0], i=[true])
              LogicalFilter(condition=[=($1, 'dept1')])
                LogicalTableScan(table=[[CATALOG, SALES, DEPTNULLABLES]])
          LogicalAggregate(group=[{}], c=[COUNT()], ck=[COUNT($0)])
            LogicalProject(DEPTNO=[$0])
              LogicalFilter(condition=[=($1, 'dept2')])
                LogicalTableScan(table=[[CATALOG, SALES, DEPTNULLABLES]])
        LogicalProject(DEPTNO=[$0], i=[true])
          LogicalFilter(condition=[=($1, 'dept2')])
            LogicalTableScan(table=[[CATALOG, SALES, DEPTNULLABLES]])
The wrong part is that when build the downstream LogicalFilter for the two 
sub-queries, the filter for the second sub-query is AND(<>($2, 0), IS NOT 
NULL($9), IS NOT NULL($1)), notice that $2 should be the second sub-query's 
intermediate table field ct.c(which field index is $6), but now the actual 
reference is the first sub-query's, this leads to wrong plan, and wrong result.
The root cause is that intermediate table alias is the same as the previous 
sub-query's, but when lookup intermediate table field, it always returns the 
previous one which is not belong to the current subquery. 


> Wrong plan for multiple IN/SOME sub-queries with OR predicate
> -------------------------------------------------------------
>
>                 Key: CALCITE-5655
>                 URL: https://issues.apache.org/jira/browse/CALCITE-5655
>             Project: Calcite
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 1.34.0
>            Reporter: Runkang He
>            Priority: Major
>
> When the query contains multiple IN/SOME sub-queries connected with OR 
> predicate in WHERE clause, the result is wrong. The minimal reproducer is 
> below:
> SQL:
> {code:SQL}
> select empno from sales.empnullables
> where deptno in (
>   select deptno from sales.deptnullables where name = 'dept1')
> or deptno in (
>   select deptno from sales.deptnullables where name = 'dept2')
> {code}
> The Plan generated by calcite master branch: (Notice the bold part in the 
> downstream LogicalFilter)
> {code:SQL}
> LogicalProject(EMPNO=[$0])
>   LogicalProject(EMPNO=[$0], DEPTNO=[$1])
>     LogicalFilter(condition=[OR(AND(<>($2, 0), IS NOT NULL($5), IS NOT 
> NULL($1)), AND(*<>($2, 0)*, IS NOT NULL($9), IS NOT NULL($1)))])
>       LogicalJoin(condition=[=($1, $8)], joinType=[left])
>         LogicalJoin(condition=[true], joinType=[inner])
>           LogicalJoin(condition=[=($1, $4)], joinType=[left])
>             LogicalJoin(condition=[true], joinType=[inner])
>               LogicalProject(EMPNO=[$0], DEPTNO=[$7])
>                 LogicalTableScan(table=[[CATALOG, SALES, EMPNULLABLES]])
>               LogicalAggregate(group=[{}], c=[COUNT()], ck=[COUNT($0)])
>                 LogicalProject(DEPTNO=[$0])
>                   LogicalFilter(condition=[=($1, 'dept1')])
>                     LogicalTableScan(table=[[CATALOG, SALES, DEPTNULLABLES]])
>             LogicalProject(DEPTNO=[$0], i=[true])
>               LogicalFilter(condition=[=($1, 'dept1')])
>                 LogicalTableScan(table=[[CATALOG, SALES, DEPTNULLABLES]])
>           LogicalAggregate(group=[{}], c=[COUNT()], ck=[COUNT($0)])
>             LogicalProject(DEPTNO=[$0])
>               LogicalFilter(condition=[=($1, 'dept2')])
>                 LogicalTableScan(table=[[CATALOG, SALES, DEPTNULLABLES]])
>         LogicalProject(DEPTNO=[$0], i=[true])
>           LogicalFilter(condition=[=($1, 'dept2')])
>             LogicalTableScan(table=[[CATALOG, SALES, DEPTNULLABLES]])
> {code}
> The wrong part is that when build the downstream LogicalFilter for the two 
> sub-queries, the filter for the second sub-query is AND(<>($2, 0), IS NOT 
> NULL($9), IS NOT NULL($1)), notice that $2 should be the second sub-query's 
> intermediate table field ct.c(which field index is $6), but now the actual 
> reference is the first sub-query's, this leads to wrong plan, and wrong 
> result.
> The root cause is that intermediate table alias is the same as the previous 
> sub-query's, but when lookup intermediate table field, it always returns the 
> previous one which is not belong to the current subquery. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to