Vineet Garg created CALCITE-1624:
------------------------------------
Summary: Inefficient plan for NOT IN correlated subqueries
Key: CALCITE-1624
URL: https://issues.apache.org/jira/browse/CALCITE-1624
Project: Calcite
Issue Type: Bug
Components: core
Reporter: Vineet Garg
Assignee: Julian Hyde
I just noticed that {{NOT IN}} correlated subqueries produces an extra
un-neccessary join after de-correlation (this is an addition to un-necessary
joins reported in CALCITE-1494)
Query
{code:SQL}
select sal from emp
where empno NOT IN (
select deptno from dept
where emp.job = dept.name)
{code}
Plan after subquery remove rule:
{code}
LogicalProject(SAL=[$5])
LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4],
SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8])
LogicalFilter(condition=[NOT(CASE(=($9, 0), false, IS NOT NULL($12), true,
<($10, $9), true, false))])
LogicalCorrelate(correlation=[$cor0], joinType=[LEFT],
requiredColumns=[{2}])
LogicalCorrelate(correlation=[$cor0], joinType=[LEFT],
requiredColumns=[{2}])
LogicalTableScan(table=[[CATALOG, SALES, EMP]])
LogicalAggregate(group=[{}], c=[COUNT()], ck=[COUNT($0)])
LogicalProject(DEPTNO=[$0])
LogicalFilter(condition=[=($cor0.JOB, $1)])
LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
LogicalFilter(condition=[=($cor0.EMPNO, $0)])
LogicalAggregate(group=[{0, 1}])
LogicalProject(DEPTNO=[$0], i=[true])
LogicalProject(DEPTNO=[$0])
LogicalFilter(condition=[=($cor0.JOB, $1)])
LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
{code}
Plan after de-correlation
{code}
LogicalProject(SAL=[$5])
LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4],
SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8])
LogicalFilter(condition=[NOT(CASE(=($10, 0), false, IS NOT NULL($14), true,
<($11, $10), true, false))])
LogicalJoin(condition=[AND(=($0, $15), =($2, $13))], joinType=[left])
LogicalJoin(condition=[=($2, $9)], joinType=[left])
LogicalTableScan(table=[[CATALOG, SALES, EMP]])
LogicalAggregate(group=[{0}], c=[COUNT()], ck=[COUNT($1)])
LogicalProject(JOB=[$1], DEPTNO=[$0])
LogicalProject(DEPTNO=[$0], JOB=[$2])
LogicalJoin(condition=[=($2, $1)], joinType=[inner])
LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
LogicalAggregate(group=[{0}])
LogicalProject(JOB=[$2])
LogicalTableScan(table=[[CATALOG, SALES, EMP]])
LogicalJoin(condition=[=($3, $0)], joinType=[inner]) // <==
Un-necessary join
LogicalProject(DEPTNO=[$0], JOB=[$1], $f2=[true])
LogicalAggregate(group=[{0, 1}])
LogicalProject(DEPTNO=[$0], JOB=[$2], i=[$1])
LogicalProject(DEPTNO=[$0], i=[true], JOB=[$1])
LogicalProject(DEPTNO=[$0], JOB=[$2])
LogicalJoin(condition=[=($2, $1)], joinType=[inner])
LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
LogicalAggregate(group=[{0}])
LogicalProject(JOB=[$2])
LogicalTableScan(table=[[CATALOG, SALES, EMP]])
LogicalAggregate(group=[{0}])
LogicalProject(EMPNO=[$0])
LogicalTableScan(table=[[CATALOG, SALES, EMP]])
{code}
As you can see in plan after de-correlation there is an un-necessary inner join.
This is not reproducible on CALCITE-1494's branch. But since this is a separate
issue from CALCITE-1494 I decided to open a separate JIRA.
Feel free to mark is duplicate or close it if you think otherwise.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)