[ 
https://issues.apache.org/jira/browse/CALCITE-2329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16756664#comment-16756664
 ] 

Volodymyr Vysotskyi commented on CALCITE-2329:
----------------------------------------------

[~vgarg], I agree with you that such plan will produce wrong results, since it 
may change resulting rows number. I have submitted a similar query for the 
current master, and resulting plan contains the required aggregate operator:
{noformat}
select \"empid\" from \"hr\".\"emps\" where 2 in (select \"deptno\" from 
\"hr\".\"emps\")

LogicalProject(empid=[$0]): rowcount = 100.0, cumulative cost = {631.0 rows, 
617.0 cpu, 0.0 io}, id = 17
  LogicalJoin(condition=[true], joinType=[inner]): rowcount = 100.0, cumulative 
cost = {531.0 rows, 517.0 cpu, 0.0 io}, id = 16
    LogicalProject(empid=[$0]): rowcount = 100.0, cumulative cost = {200.0 
rows, 201.0 cpu, 0.0 io}, id = 9
      EnumerableTableScan(table=[[hr, emps]]): rowcount = 100.0, cumulative 
cost = {100.0 rows, 101.0 cpu, 0.0 io}, id = 0
    LogicalAggregate(group=[{0}]): rowcount = 1.0, cumulative cost = {231.0 
rows, 316.0 cpu, 0.0 io}, id = 15
      LogicalProject(cs=[true]): rowcount = 15.0, cumulative cost = {230.0 
rows, 316.0 cpu, 0.0 io}, id = 14
        LogicalFilter(condition=[=(2, $0)]): rowcount = 15.0, cumulative cost = 
{215.0 rows, 301.0 cpu, 0.0 io}, id = 13
          LogicalProject(deptno=[$1]): rowcount = 100.0, cumulative cost = 
{200.0 rows, 201.0 cpu, 0.0 io}, id = 2
            EnumerableTableScan(table=[[hr, emps]]): rowcount = 100.0, 
cumulative cost = {100.0 rows, 101.0 cpu, 0.0 io}, id = 1
{noformat}


> Enhance SubQueryRemoveRule to rewrite IN operator with the constant from the 
> left side more optimally
> -----------------------------------------------------------------------------------------------------
>
>                 Key: CALCITE-2329
>                 URL: https://issues.apache.org/jira/browse/CALCITE-2329
>             Project: Calcite
>          Issue Type: Improvement
>            Reporter: Volodymyr Vysotskyi
>            Assignee: Julian Hyde
>            Priority: Major
>             Fix For: 1.17.0
>
>
> Currently, for queries like this:
> {code:sql}
> select sal from emp where 2 IN (select deptno from dept)
> {code}
> SubQueryRemoveRule rules expand query plan in such a way:
> {noformat}
> LogicalProject(SAL=[$5])
>   LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], 
> SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8])
>     LogicalJoin(condition=[=(2, $9)], joinType=[inner])
>       LogicalTableScan(table=[[CATALOG, SALES, EMP]])
>       LogicalProject(DEPTNO=[$0])
>         LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
> {noformat}
> Since join condition is actually a filter condition, it will be pushed into 
> the filter during further planning stages and join with the true condition is 
> left.
> But these types of the queries may be rewritten in the same way as EXISTS 
> queries:
> {code:sql}
> select sal from emp where EXISTS (select deptno from dept where deptno=2)
> {code}
> with the more optimal plan:
> {noformat}
> LogicalProject(SAL=[$5])
>   LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], 
> SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8])
>     LogicalJoin(condition=[true], joinType=[inner])
>       LogicalTableScan(table=[[CATALOG, SALES, EMP]])
>       LogicalAggregate(group=[{0}])
>         LogicalProject(i=[true])
>           LogicalFilter(condition=[=($0, 2)])
>             LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to