[
https://issues.apache.org/jira/browse/CALCITE-6962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17947529#comment-17947529
]
suibianwanwan commented on CALCITE-6962:
----------------------------------------
Upon further analysis, I realized the issue was caused by ignoring {{NULL}}
equality when handling {*}Correlate{*}.
As described in the formula derivation in *Unnesting Arbitrary Queries*: _"Note
that unless indicated otherwise this operator has IS semantics, i.e., it
compares NULL values as equal."_
This is also reflected in the pseudo code from the paper *Improving Unnesting
of Complex Queries:*
{code:java}
fun unnest(join, info, accessing):
// Split accessing into accessingLeft and accessingRight for input of join
if accessing(join) is not empty:
dJoinElimination(join, info, accessing)
return
// Check if only one side accesses outer columns
if accessingRight is empty and info.join cannot output unmatched from the
right:
unnest(join.left, info, accessingLeft)
rewriteColumns(join.condition, info)
return if accessingLeft is empty and info.join cannot output unmatched
from the left:
unnest(join.right, info, accessingRight)
rewriteColumns(join.condition, info)
return // Unnest both sides
unnestingLeft = new Unnesting(info.info)
unnestingRight = new Unnesting(info.info)
unnest(join.left, unnestLeft, accessingLeft)
unnest(join.right, unnestRight, accessingRight)
rewriteColumnsForJoin(join.condition, unnestingLeft, unnestingRight)
for c in info.outerRefs:
add "{unnestLeft.repr[c]}␣is␣not␣distinct␣from␣{unnestRight.repr[c]}" to
join.condition merge cclasses and repr from unnestLeft and unnestRight into
info {code}
> Exists subquery returns incorrect result when or condition involves null
> column
> -------------------------------------------------------------------------------
>
> Key: CALCITE-6962
> URL: https://issues.apache.org/jira/browse/CALCITE-6962
> Project: Calcite
> Issue Type: Bug
> Reporter: suibianwanwan
> Assignee: suibianwanwan
> Priority: Major
>
> Test in sub-query.iq
> {code:java}
> select *
> from "scott".emp as e
> where exists (
> select empno
> from "scott".emp as ee
> where e.empno = ee.empno or e.comm >= ee.sal
> )
> {code}
> Expected result:
> {code:java}
> +-------+--------+-----------+------+------------+---------+---------+--------+
> | EMPNO | ENAME | JOB | MGR | HIREDATE | SAL | COMM | DEPTNO
> |
> +-------+--------+-----------+------+------------+---------+---------+--------+
> | 7369 | SMITH | CLERK | 7902 | 1980-12-17 | 800.00 | | 20
> |
> | 7499 | ALLEN | SALESMAN | 7698 | 1981-02-20 | 1600.00 | 300.00 | 30
> |
> | 7521 | WARD | SALESMAN | 7698 | 1981-02-22 | 1250.00 | 500.00 | 30
> |
> | 7566 | JONES | MANAGER | 7839 | 1981-02-04 | 2975.00 | | 20
> |
> | 7654 | MARTIN | SALESMAN | 7698 | 1981-09-28 | 1250.00 | 1400.00 | 30
> |
> | 7698 | BLAKE | MANAGER | 7839 | 1981-01-05 | 2850.00 | | 30
> |
> | 7782 | CLARK | MANAGER | 7839 | 1981-06-09 | 2450.00 | | 10
> |
> | 7788 | SCOTT | ANALYST | 7566 | 1987-04-19 | 3000.00 | | 20
> |
> | 7839 | KING | PRESIDENT | | 1981-11-17 | 5000.00 | | 10
> |
> | 7844 | TURNER | SALESMAN | 7698 | 1981-09-08 | 1500.00 | 0.00 | 30
> |
> | 7876 | ADAMS | CLERK | 7788 | 1987-05-23 | 1100.00 | | 20
> |
> | 7900 | JAMES | CLERK | 7698 | 1981-12-03 | 950.00 | | 30
> |
> | 7902 | FORD | ANALYST | 7566 | 1981-12-03 | 3000.00 | | 20
> |
> | 7934 | MILLER | CLERK | 7782 | 1982-01-23 | 1300.00 | | 10
> |
> +-------+--------+-----------+------+------------+---------+---------+--------+
> (14 rows)
> {code}
> Actual result:
> {code:java}
> +-------+--------+----------+------+------------+---------+---------+--------+
> | EMPNO | ENAME | JOB | MGR | HIREDATE | SAL | COMM | DEPTNO |
> +-------+--------+----------+------+------------+---------+---------+--------+
> | 7499 | ALLEN | SALESMAN | 7698 | 1981-02-20 | 1600.00 | 300.00 | 30 |
> | 7521 | WARD | SALESMAN | 7698 | 1981-02-22 | 1250.00 | 500.00 | 30 |
> | 7654 | MARTIN | SALESMAN | 7698 | 1981-09-28 | 1250.00 | 1400.00 | 30 |
> | 7844 | TURNER | SALESMAN | 7698 | 1981-09-08 | 1500.00 | 0.00 | 30 |
> +-------+--------+----------+------+------------+---------+---------+--------+
> {code}
> Plan:
> {code:java}
> EnumerableSort(sort0=[$0], sort1=[$5], dir0=[ASC], dir1=[ASC])
> EnumerableHashJoin(condition=[AND(=($0, $10), =($6, $11))], joinType=[semi])
> EnumerableTableScan(table=[[scott, EMP]])
> EnumerableNestedLoopJoin(condition=[OR(=($2, $0), >=($3, $1))],
> joinType=[inner])
> EnumerableCalc(expr#0..7=[{inputs}], EMPNO=[$t0], SAL=[$t5])
> EnumerableTableScan(table=[[scott, EMP]])
> EnumerableCalc(expr#0..7=[{inputs}], EMPNO=[$t0], COMM=[$t6])
> EnumerableTableScan(table=[[scott, EMP]])
> {code}
> Rows where {{COMM}} is null were excluded from the result set due to the join
> condition. This is likely caused by a semantic mismatch between the original
> {{EXISTS}} clause and its translated semi-join form:
> {code:java}
> EnumerableHashJoin(condition=[AND(=($0, $10), =($6, $11))],
> joinType=[semi]){code}
> If comm is null,
> {code:java}
> e.empno = ee.empno or e.comm >= ee.sal{code}
> evaluates to true, but
> {code:java}
> AND(=($0, $10), =($6, $11)){code}
> evaluates to false.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)