[ https://issues.apache.org/jira/browse/CALCITE-1493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Vineet Garg updated CALCITE-1493: --------------------------------- Description: Plan generated by calcite with SubqueryRemoveRule followed by de-correlation for the following query: {noformat} select sal from emp where empno NOT IN (select deptno from dept where emp.job = dept.name) {noformat} is {noformat} LogicalProject(SAL=[$5]) LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8]) LogicalFilter(condition=[IS NULL($11)]) LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8], DEPTNO0=[CAST($9):INTEGER], JOB0=[CAST($10):VARCHAR(10) CHARACTER SET "ISO-8859-1" COLLATE "ISO-8859-1$en_US$primary"], $f2=[CAST($11):BOOLEAN]) LogicalJoin(condition=[AND(=($2, $10), =($0, $9))], joinType=[inner]) LogicalTableScan(table=[[CATALOG, SALES, EMP]]) LogicalProject(DEPTNO=[$0], JOB=[$1], $f2=[true]) LogicalAggregate(group=[{0, 1}]) LogicalProject(DEPTNO=[$0], JOB=[$2], i=[$1]) LogicalProject(DEPTNO=[$0], i=[true], JOB=[$1]) LogicalProject(DEPTNO=[$0], JOB=[$2]) LogicalJoin(condition=[=($2, $1)], joinType=[inner]) LogicalTableScan(table=[[CATALOG, SALES, DEPT]]) LogicalAggregate(group=[{0}]) LogicalProject(JOB=[$2]) LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {noformat} As you can notice instead of doing Left Outer Join Calcite is doing Inner Join. This will produce wrong results. Plan for same query just before SubqueryRemove Rule is: {noformat} LogicalProject(SAL=[$5]) LogicalFilter(condition=[NOT(IN($0, { LogicalProject(DEPTNO=[$0]) LogicalFilter(condition=[=($cor0.JOB, $1)]) LogicalTableScan(table=[[CATALOG, SALES, DEPT]]) }))], variablesSet=[[$cor0]]) LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {noformat} Plan just after SubqueryRemove Rule: {noformat} LogicalProject(SAL=[$5]) LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8]) LogicalFilter(condition=[IS NULL($10)]) LogicalFilter(condition=[=($0, $9)]) LogicalCorrelate(correlation=[$cor0], joinType=[LEFT], requiredColumns=[{2}]) LogicalTableScan(table=[[CATALOG, SALES, EMP]]) LogicalAggregate(group=[{0, 1}]) LogicalProject(DEPTNO=[$0], i=[true]) LogicalProject(DEPTNO=[$0]) LogicalFilter(condition=[=($cor0.JOB, $1)]) LogicalTableScan(table=[[CATALOG, SALES, DEPT]]) {noformat} Looking at above it seems RelDecorrelator have some issue where it is coming up with Inner Join. was: Plan generated by calcite with SubqueryRemoveRule followed by de-correlation for the following query: {noformat} select sal from emp where empno NOT IN (select deptno from dept where emp.job = dept.name) {noformat} is {noformat} LogicalProject(SAL=[$5]) LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8]) LogicalFilter(condition=[IS NULL($11)]) LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8], DEPTNO0=[CAST($9):INTEGER], JOB0=[CAST($10):VARCHAR(10) CHARACTER SET "ISO-8859-1" COLLATE "ISO-8859-1$en_US$primary"], $f2=[CAST($11):BOOLEAN]) LogicalJoin(condition=[AND(=($2, $10), =($0, $9))], joinType=[inner]) LogicalTableScan(table=[[CATALOG, SALES, EMP]]) LogicalProject(DEPTNO=[$0], JOB=[$1], $f2=[true]) LogicalAggregate(group=[{0, 1}]) LogicalProject(DEPTNO=[$0], JOB=[$2], i=[$1]) LogicalProject(DEPTNO=[$0], i=[true], JOB=[$1]) LogicalProject(DEPTNO=[$0], JOB=[$2]) LogicalJoin(condition=[=($2, $1)], joinType=[inner]) LogicalTableScan(table=[[CATALOG, SALES, DEPT]]) LogicalAggregate(group=[{0}]) LogicalProject(JOB=[$2]) LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {noformat} As you can notice instead of doing {noformat} Left Outer Join {noformat} Calcite is doing {noformat} Inner Join {noformat}. This will produce wrong results. Plan for same query just before SubqueryRemove Rule is: {noformat} LogicalProject(SAL=[$5]) LogicalFilter(condition=[NOT(IN($0, { LogicalProject(DEPTNO=[$0]) LogicalFilter(condition=[=($cor0.JOB, $1)]) LogicalTableScan(table=[[CATALOG, SALES, DEPT]]) }))], variablesSet=[[$cor0]]) LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {noformat} Plan just after SubqueryRemove Rule: {noformat} LogicalProject(SAL=[$5]) LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8]) LogicalFilter(condition=[IS NULL($10)]) LogicalFilter(condition=[=($0, $9)]) LogicalCorrelate(correlation=[$cor0], joinType=[LEFT], requiredColumns=[{2}]) LogicalTableScan(table=[[CATALOG, SALES, EMP]]) LogicalAggregate(group=[{0, 1}]) LogicalProject(DEPTNO=[$0], i=[true]) LogicalProject(DEPTNO=[$0]) LogicalFilter(condition=[=($cor0.JOB, $1)]) LogicalTableScan(table=[[CATALOG, SALES, DEPT]]) {noformat} Looking at above it seems RelDecorrelator have some issue where it is coming up with Inner Join. > Wrong plan for NOT IN correlated queries > ---------------------------------------- > > Key: CALCITE-1493 > URL: https://issues.apache.org/jira/browse/CALCITE-1493 > Project: Calcite > Issue Type: Bug > Components: core > Reporter: Vineet Garg > Assignee: Julian Hyde > > Plan generated by calcite with SubqueryRemoveRule followed by de-correlation > for the following query: > {noformat} select sal from emp where empno NOT IN (select deptno from dept > where emp.job = dept.name) {noformat} > is > {noformat} > LogicalProject(SAL=[$5]) > LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], > SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8]) > LogicalFilter(condition=[IS NULL($11)]) > LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], > HIREDATE=[$4], SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8], > DEPTNO0=[CAST($9):INTEGER], JOB0=[CAST($10):VARCHAR(10) CHARACTER SET > "ISO-8859-1" COLLATE "ISO-8859-1$en_US$primary"], $f2=[CAST($11):BOOLEAN]) > LogicalJoin(condition=[AND(=($2, $10), =($0, $9))], joinType=[inner]) > LogicalTableScan(table=[[CATALOG, SALES, EMP]]) > LogicalProject(DEPTNO=[$0], JOB=[$1], $f2=[true]) > LogicalAggregate(group=[{0, 1}]) > LogicalProject(DEPTNO=[$0], JOB=[$2], i=[$1]) > LogicalProject(DEPTNO=[$0], i=[true], JOB=[$1]) > LogicalProject(DEPTNO=[$0], JOB=[$2]) > LogicalJoin(condition=[=($2, $1)], joinType=[inner]) > LogicalTableScan(table=[[CATALOG, SALES, DEPT]]) > LogicalAggregate(group=[{0}]) > LogicalProject(JOB=[$2]) > LogicalTableScan(table=[[CATALOG, SALES, EMP]]) > {noformat} > As you can notice instead of doing Left Outer Join Calcite is doing Inner > Join. This will produce wrong results. > Plan for same query just before SubqueryRemove Rule is: > {noformat} > LogicalProject(SAL=[$5]) > LogicalFilter(condition=[NOT(IN($0, { > LogicalProject(DEPTNO=[$0]) > LogicalFilter(condition=[=($cor0.JOB, $1)]) > LogicalTableScan(table=[[CATALOG, SALES, DEPT]]) > }))], variablesSet=[[$cor0]]) > LogicalTableScan(table=[[CATALOG, SALES, EMP]]) > {noformat} > Plan just after SubqueryRemove Rule: > {noformat} > LogicalProject(SAL=[$5]) > LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], > SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8]) > LogicalFilter(condition=[IS NULL($10)]) > LogicalFilter(condition=[=($0, $9)]) > LogicalCorrelate(correlation=[$cor0], joinType=[LEFT], > requiredColumns=[{2}]) > LogicalTableScan(table=[[CATALOG, SALES, EMP]]) > LogicalAggregate(group=[{0, 1}]) > LogicalProject(DEPTNO=[$0], i=[true]) > LogicalProject(DEPTNO=[$0]) > LogicalFilter(condition=[=($cor0.JOB, $1)]) > LogicalTableScan(table=[[CATALOG, SALES, DEPT]]) > {noformat} > Looking at above it seems RelDecorrelator have some issue where it is coming > up with Inner Join. -- This message was sent by Atlassian JIRA (v6.3.4#6332)