[
https://issues.apache.org/jira/browse/CALCITE-5880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17754350#comment-17754350
]
Julian Hyde commented on CALCITE-5880:
--------------------------------------
I tried a simple query:
{code}
> explain plan for
> select * from scott_clone.emp as e1
> where exists (
> select 1 from scott_clone.emp as e2
> where e2.comm is not distinct from e1.comm);
EnumerableCalc(expr#0..8=[{inputs}], proj#0..7=[{exprs}])
EnumerableHashJoin(condition=[AND(=($6, $11), =($8, $12))], joinType=[semi])
EnumerableCalc(expr#0..7=[{inputs}], expr#8=[IS NULL($t6)],
proj#0..8=[{exprs}])
EnumerableTableScan(table=[[SCOTT_CLONE, EMP]])
EnumerableNestedLoopJoin(condition=[OR(AND(IS NULL($1), $3), =($1, $2))],
joinType=[inner])
EnumerableCalc(expr#0..7=[{inputs}], EMPNO=[$t0], COMM=[$t6])
EnumerableTableScan(table=[[SCOTT_CLONE, EMP]])
EnumerableAggregate(group=[{0, 1}])
EnumerableCalc(expr#0..7=[{inputs}], expr#8=[IS NULL($t6)], COMM=[$t6],
$f8=[$t8])
EnumerableTableScan(table=[[SCOTT_CLONE, EMP]])
{code}
I used the {{comm}} column because it is nullable in the standard {{EMP}}
table. And sure enough the plan handles null values: {{OR(AND(IS NULL($1), $3),
=($1, $2))}}.
Are you sure that the optimizer is not optimizing {{IS NOT DISTINCT FROM}} to
{{=}} because one of your columns is {{NOT NULL}}?
> When the association condition of the association subquery is 'is not
> distinct from', the join condition becomes '=' after decorrelation.
> -----------------------------------------------------------------------------------------------------------------------------------------
>
> Key: CALCITE-5880
> URL: https://issues.apache.org/jira/browse/CALCITE-5880
> Project: Calcite
> Issue Type: Bug
> Reporter: libopeng
> Assignee: LakeShen
> Priority: Major
> Attachments: image-2023-08-01-15-20-22-105.png,
> image-2023-08-02-10-15-00-455.png, image-2023-08-13-21-36-00-494.png
>
>
> {code:java}
> select EMPNO
> from emp
> where EXISTS (select DEPTNO from dept where emp.EMPNO is not DISTINCT from
> dept.DEPTNO){code}
> before decorrelation
> {code:java}
> LogicalProject(EMPNO=[$0])
> LogicalFilter(condition=[IS NOT NULL($9)])
> LogicalCorrelate(correlation=[$cor0], joinType=[left],
> requiredColumns=[{0}])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> LogicalAggregate(group=[{}], agg#0=[MIN($0)])
> LogicalProject($f0=[true])
> LogicalFilter(condition=[OR(AND(IS NULL($cor0.EMPNO), IS NULL($0)),
> IS TRUE(=($cor0.EMPNO, $0)))]) <--- here is 't1.x is not distinct from
> t2.x'
> LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
> {code}
> after decorrelation
> {code:java}
> LogicalProject(EMPNO=[$0])
> LogicalFilter(condition=[IS NOT NULL($9)])
> LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4],
> SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8], $f0=[$13])
> LogicalJoin(condition=[AND(=($0, $11), =($9, $12))], joinType=[left])
> <---here is 't1.x=t2.x and (t1.x is null = t2.x is null)'
> LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3],
> HIREDATE=[$4], SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8], $f9=[false],
> EMPNO0=[$0])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> LogicalAggregate(group=[{0, 1}], agg#0=[MIN($2)])
> LogicalProject(EMPNO=[$2], $f9=[$3], $f0=[true])
> LogicalJoin(condition=[=($2, $0)], joinType=[inner])
> LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
> LogicalProject(EMPNO=[$0], $f9=[false])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
>
> {code:java}
> EMP | DEPT
> +---------+ | +---------+
> | EMPNO | | | DEPTNO |
> +---------+ | +---------+
> | null | | | null |
> {code}
> expect result
> {code:java}
> EMPNO
> +-------+
> | null |
> +-------+{code}
> actual result
> {code:java}
> EMPNO
> +-------+
> +-------+ {code}
>
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)