[
https://issues.apache.org/jira/browse/CALCITE-2329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16756664#comment-16756664
]
Volodymyr Vysotskyi commented on CALCITE-2329:
----------------------------------------------
[~vgarg], I agree with you that such plan will produce wrong results, since it
may change resulting rows number. I have submitted a similar query for the
current master, and resulting plan contains the required aggregate operator:
{noformat}
select \"empid\" from \"hr\".\"emps\" where 2 in (select \"deptno\" from
\"hr\".\"emps\")
LogicalProject(empid=[$0]): rowcount = 100.0, cumulative cost = {631.0 rows,
617.0 cpu, 0.0 io}, id = 17
LogicalJoin(condition=[true], joinType=[inner]): rowcount = 100.0, cumulative
cost = {531.0 rows, 517.0 cpu, 0.0 io}, id = 16
LogicalProject(empid=[$0]): rowcount = 100.0, cumulative cost = {200.0
rows, 201.0 cpu, 0.0 io}, id = 9
EnumerableTableScan(table=[[hr, emps]]): rowcount = 100.0, cumulative
cost = {100.0 rows, 101.0 cpu, 0.0 io}, id = 0
LogicalAggregate(group=[{0}]): rowcount = 1.0, cumulative cost = {231.0
rows, 316.0 cpu, 0.0 io}, id = 15
LogicalProject(cs=[true]): rowcount = 15.0, cumulative cost = {230.0
rows, 316.0 cpu, 0.0 io}, id = 14
LogicalFilter(condition=[=(2, $0)]): rowcount = 15.0, cumulative cost =
{215.0 rows, 301.0 cpu, 0.0 io}, id = 13
LogicalProject(deptno=[$1]): rowcount = 100.0, cumulative cost =
{200.0 rows, 201.0 cpu, 0.0 io}, id = 2
EnumerableTableScan(table=[[hr, emps]]): rowcount = 100.0,
cumulative cost = {100.0 rows, 101.0 cpu, 0.0 io}, id = 1
{noformat}
> Enhance SubQueryRemoveRule to rewrite IN operator with the constant from the
> left side more optimally
> -----------------------------------------------------------------------------------------------------
>
> Key: CALCITE-2329
> URL: https://issues.apache.org/jira/browse/CALCITE-2329
> Project: Calcite
> Issue Type: Improvement
> Reporter: Volodymyr Vysotskyi
> Assignee: Julian Hyde
> Priority: Major
> Fix For: 1.17.0
>
>
> Currently, for queries like this:
> {code:sql}
> select sal from emp where 2 IN (select deptno from dept)
> {code}
> SubQueryRemoveRule rules expand query plan in such a way:
> {noformat}
> LogicalProject(SAL=[$5])
> LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4],
> SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8])
> LogicalJoin(condition=[=(2, $9)], joinType=[inner])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> LogicalProject(DEPTNO=[$0])
> LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
> {noformat}
> Since join condition is actually a filter condition, it will be pushed into
> the filter during further planning stages and join with the true condition is
> left.
> But these types of the queries may be rewritten in the same way as EXISTS
> queries:
> {code:sql}
> select sal from emp where EXISTS (select deptno from dept where deptno=2)
> {code}
> with the more optimal plan:
> {noformat}
> LogicalProject(SAL=[$5])
> LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4],
> SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8])
> LogicalJoin(condition=[true], joinType=[inner])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> LogicalAggregate(group=[{0}])
> LogicalProject(i=[true])
> LogicalFilter(condition=[=($0, 2)])
> LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
> {noformat}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)