[ 
https://issues.apache.org/jira/browse/CALCITE-5680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Runkang He updated CALCITE-5680:
--------------------------------
    Description: 
When the query contains multiple IN sub-queries with only literal operands, and 
connected with OR predicate in WHERE clause, the result is wrong. The minimal 
reproducer is below:
SQL:

{code:sql}
select empno from sales.empnullables
where 1 in (
  select deptno from sales.deptnullables where name = 'dept1')
or 2 in (
  select deptno from sales.deptnullables where name = 'dept2')
{code}

The Plan generated by calcite master branch: (Notice the bold part of IS 
NULL(*$2*) in the downstream LogicalFilter)

{code:sql}
LogicalProject(EMPNO=[$0])
  LogicalFilter(condition=[OR(CASE(IS NULL($2), false, =($1, false), 
null:BOOLEAN, IS NOT NULL($1), true, false), CASE(IS NULL(*$2*), false, =($1, 
false), null:BOOLEAN, IS NOT NULL($1), true, false))])
    LogicalJoin(condition=[true], joinType=[left])
      LogicalJoin(condition=[true], joinType=[left])
        LogicalProject(EMPNO=[$0])
          LogicalTableScan(table=[[CATALOG, SALES, EMPNULLABLES]])
        LogicalSort(sort0=[$0], dir0=[DESC], fetch=[1])
          LogicalAggregate(group=[{0}], c=[COUNT()])
            LogicalProject(cs=[IS NOT NULL($0)])
              LogicalFilter(condition=[OR(=(1, $0), IS NULL($0))])
                LogicalProject(DEPTNO=[$0])
                  LogicalFilter(condition=[=($1, 'dept1')])
                    LogicalTableScan(table=[[CATALOG, SALES, DEPTNULLABLES]])
      LogicalSort(sort0=[$0], dir0=[DESC], fetch=[1])
        LogicalAggregate(group=[{0}], c=[COUNT()])
          LogicalProject(cs=[IS NOT NULL($0)])
            LogicalFilter(condition=[OR(=(2, $0), IS NULL($0))])
              LogicalProject(DEPTNO=[$0])
                LogicalFilter(condition=[=($1, 'dept2')])
                  LogicalTableScan(table=[[CATALOG, SALES, DEPTNULLABLES]])
{code}

The wrong part is that when build the downstream LogicalFilter for the two 
sub-queries, the filter for the second sub-query is CASE(IS NULL($2), false, 
=($1, false), null:BOOLEAN, IS NOT NULL($1), true, false), notice that *$2 
should be the second sub-query's intermediate table field dt.c(which field 
index is $4), and $1 should be the second sub-query's intermediate table field 
dt.cs(which field index is $3), but now the actual reference is the first 
sub-query's, this leads to wrong plan, and wrong result*.

The root cause is that intermediate table's field name is the same as the 
previous sub-query's, and when lookup intermediate table field, it always 
returns the previous one which is not belong to the current subquery.

To fix this issue, we should use both different intermediate table name and 
field name to identify the reference correctly. (Note that we make intermediate 
table name different in 
[Calcite-5655|https://issues.apache.org/jira/browse/CALCITE-5655])

  was:
When the query contains multiple IN sub-queries with only literal operands, and 
connected with OR predicate in WHERE clause, the result is wrong. The minimal 
reproducer is below:
SQL:

{code:sql}
select empno from sales.empnullables
where 1 in (
  select deptno from sales.deptnullables where name = 'dept1')
or 2 in (
  select deptno from sales.deptnullables where name = 'dept2')
{code}

The Plan generated by calcite master branch: (Notice the bold part of IS 
NULL(*$2*) in the downstream LogicalFilter)

{code:sql}
LogicalProject(EMPNO=[$0])
  LogicalFilter(condition=[OR(CASE(IS NULL($2), false, =($1, false), 
null:BOOLEAN, IS NOT NULL($1), true, false), CASE(IS NULL(*$2*), false, =($1, 
false), null:BOOLEAN, IS NOT NULL($1), true, false))])
    LogicalJoin(condition=[true], joinType=[left])
      LogicalJoin(condition=[true], joinType=[left])
        LogicalProject(EMPNO=[$0])
          LogicalTableScan(table=[[CATALOG, SALES, EMPNULLABLES]])
        LogicalSort(sort0=[$0], dir0=[DESC], fetch=[1])
          LogicalAggregate(group=[{0}], c=[COUNT()])
            LogicalProject(cs=[IS NOT NULL($0)])
              LogicalFilter(condition=[OR(=(1, $0), IS NULL($0))])
                LogicalProject(DEPTNO=[$0])
                  LogicalFilter(condition=[=($1, 'dept1')])
                    LogicalTableScan(table=[[CATALOG, SALES, DEPTNULLABLES]])
      LogicalSort(sort0=[$0], dir0=[DESC], fetch=[1])
        LogicalAggregate(group=[{0}], c=[COUNT()])
          LogicalProject(cs=[IS NOT NULL($0)])
            LogicalFilter(condition=[OR(=(2, $0), IS NULL($0))])
              LogicalProject(DEPTNO=[$0])
                LogicalFilter(condition=[=($1, 'dept2')])
                  LogicalTableScan(table=[[CATALOG, SALES, DEPTNULLABLES]])
{code}

The wrong part is that when build the downstream LogicalFilter for the two 
sub-queries, the filter for the second sub-query is CASE(IS NULL($2), false, 
=($1, false), null:BOOLEAN, IS NOT NULL($1), true, false), notice that *$2 
should be the second sub-query's intermediate table field dt.c(which field 
index is $4), and $1 should be the second sub-query's intermediate table field 
dt.cs(which field index is $3), but now the actual reference is the first 
sub-query's, this leads to wrong plan, and wrong result*.

The root cause is that intermediate table alias is the same as the previous 
sub-query's, but when lookup intermediate table field, it always returns the 
previous one which is not belong to the current subquery.


> Wrong plan for multiple IN sub-queries with only literal operands
> -----------------------------------------------------------------
>
>                 Key: CALCITE-5680
>                 URL: https://issues.apache.org/jira/browse/CALCITE-5680
>             Project: Calcite
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 1.34.0
>            Reporter: Runkang He
>            Assignee: Runkang He
>            Priority: Major
>              Labels: pull-request-available
>
> When the query contains multiple IN sub-queries with only literal operands, 
> and connected with OR predicate in WHERE clause, the result is wrong. The 
> minimal reproducer is below:
> SQL:
> {code:sql}
> select empno from sales.empnullables
> where 1 in (
>   select deptno from sales.deptnullables where name = 'dept1')
> or 2 in (
>   select deptno from sales.deptnullables where name = 'dept2')
> {code}
> The Plan generated by calcite master branch: (Notice the bold part of IS 
> NULL(*$2*) in the downstream LogicalFilter)
> {code:sql}
> LogicalProject(EMPNO=[$0])
>   LogicalFilter(condition=[OR(CASE(IS NULL($2), false, =($1, false), 
> null:BOOLEAN, IS NOT NULL($1), true, false), CASE(IS NULL(*$2*), false, =($1, 
> false), null:BOOLEAN, IS NOT NULL($1), true, false))])
>     LogicalJoin(condition=[true], joinType=[left])
>       LogicalJoin(condition=[true], joinType=[left])
>         LogicalProject(EMPNO=[$0])
>           LogicalTableScan(table=[[CATALOG, SALES, EMPNULLABLES]])
>         LogicalSort(sort0=[$0], dir0=[DESC], fetch=[1])
>           LogicalAggregate(group=[{0}], c=[COUNT()])
>             LogicalProject(cs=[IS NOT NULL($0)])
>               LogicalFilter(condition=[OR(=(1, $0), IS NULL($0))])
>                 LogicalProject(DEPTNO=[$0])
>                   LogicalFilter(condition=[=($1, 'dept1')])
>                     LogicalTableScan(table=[[CATALOG, SALES, DEPTNULLABLES]])
>       LogicalSort(sort0=[$0], dir0=[DESC], fetch=[1])
>         LogicalAggregate(group=[{0}], c=[COUNT()])
>           LogicalProject(cs=[IS NOT NULL($0)])
>             LogicalFilter(condition=[OR(=(2, $0), IS NULL($0))])
>               LogicalProject(DEPTNO=[$0])
>                 LogicalFilter(condition=[=($1, 'dept2')])
>                   LogicalTableScan(table=[[CATALOG, SALES, DEPTNULLABLES]])
> {code}
> The wrong part is that when build the downstream LogicalFilter for the two 
> sub-queries, the filter for the second sub-query is CASE(IS NULL($2), false, 
> =($1, false), null:BOOLEAN, IS NOT NULL($1), true, false), notice that *$2 
> should be the second sub-query's intermediate table field dt.c(which field 
> index is $4), and $1 should be the second sub-query's intermediate table 
> field dt.cs(which field index is $3), but now the actual reference is the 
> first sub-query's, this leads to wrong plan, and wrong result*.
> The root cause is that intermediate table's field name is the same as the 
> previous sub-query's, and when lookup intermediate table field, it always 
> returns the previous one which is not belong to the current subquery.
> To fix this issue, we should use both different intermediate table name and 
> field name to identify the reference correctly. (Note that we make 
> intermediate table name different in 
> [Calcite-5655|https://issues.apache.org/jira/browse/CALCITE-5655])



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to