[ 
https://issues.apache.org/jira/browse/CALCITE-2948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16819916#comment-16819916
 ] 

Danny Chan commented on CALCITE-2948:
-------------------------------------

Hi [~hyuan] [~zabetak] [~julianhyde] [~vgarg]  I have applied a 
[PR#1169|[https://github.com/apache/calcite/pull/1169]]

The main diff it to add 2 rules in post-decorrelation phrase: 
SemiJoinRule.PROJECT and SemiJoinRemoveRule.PROJECT.
The original plan after decorrelation is:
{code:java}
LogicalProject(DEPTNO=[$7])
  LogicalJoin(condition=[AND(=($0, $10), =($7, $9))], joinType=[inner])
    LogicalTableScan(table=[[CATALOG, SALES, EMP]])
    LogicalAggregate(group=[{0, 1}])
      LogicalProject(DEPTNO=[$7], EMPNO0=[$9])
        LogicalJoin(condition=[=($0, +($9, 1))], joinType=[inner])
          LogicalTableScan(table=[[CATALOG, SALES, EMP]])
          LogicalProject(EMPNO=[$0])
            LogicalTableScan(table=[[CATALOG, SALES, EMP]]){code}
The plan is very same with what SubQueryRemoveRule produces. To reduce the 
right agg of the above join, we can use rule SemiJoinRule.PROJECT[1], then plan 
will change to:
{code:java}
LogicalProject(DEPTNO=[$7])
  SemiJoin(condition=[AND(=($0, $10), =($7, $9))], joinType=[inner])
    LogicalTableScan(table=[[CATALOG, SALES, EMP]])
    LogicalProject(DEPTNO=[$7], EMPNO0=[$9])
      LogicalJoin(condition=[=($0, +($9, 1))], joinType=[inner])
        LogicalTableScan(table=[[CATALOG, SALES, EMP]])
        LogicalProject(EMPNO=[$0])
          LogicalTableScan(table=[[CATALOG, SALES, EMP]]){code}
Add after applying SemiJoinRemoveRule.PROJECT will get the final plan:
{code:java}
LogicalProject(DEPTNO=[$7])
  LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], 
SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8])
    LogicalJoin(condition=[=($0, +($9, 1))], joinType=[inner])
      LogicalTableScan(table=[[CATALOG, SALES, EMP]])
      LogicalProject(EMPNO=[$0])
        LogicalTableScan(table=[[CATALOG, SALES, EMP]]){code}
There are 2 points that i want to discuss:
1. The new added rule SemiJoinRemoveRule.PROJECT and SemiJoinRemoveRule.Join 
only match both the joins left sides as a TableScan, which i think could be 
more common, e.g. a RelNode, but i only decide if the 2 are equals with each 
other with the node's digest, i'm not very confident if i change to RelNode it 
would be still correct.
2. In order to remove the above join, the join keys of it must all be unique, 
or the expanded join rows of same keys would be lost. But now when we use 
SubqueryRemoveRule or RelDecorrelator to decorrelate the in subquery, this info 
is still ignored when we build a join(the join rows would swell), so i ignore 
the uniqueniess either as default for SemiJoinRemoveRule, but i still add a 
flag in it to let user control the behavior.
[1] 
https://github.com/apache/calcite/blob/b03cdc486cf5c7232bbc6fa9b5f02f564e9601c3/core/src/main/java/org/apache/calcite/rel/rules/SemiJoinRule.java#L121

> SqlToRelConverter generates complicated logical plan for in subquery with 
> non-equi condition
> --------------------------------------------------------------------------------------------
>
>                 Key: CALCITE-2948
>                 URL: https://issues.apache.org/jira/browse/CALCITE-2948
>             Project: Calcite
>          Issue Type: Bug
>          Components: core
>            Reporter: Haisheng Yuan
>            Assignee: Danny Chan
>            Priority: Major
>              Labels: pull-request-available, sub-query
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Repro:
> Add the following test to SqlToRelConverterTest.java.
> {code:java}
> @Test public void testSubQueryIN() {
>     final String sql = "select deptno\n"
>         + "from EMP e\n"
>         + "where deptno in (select deptno\n"
>         + "from EMP where empno=e.empno+1)";
>     sql(sql).ok();
>   }
> {code}
> Plan:
> {code:java}
> LogicalProject(DEPTNO=[$7])
>   LogicalJoin(condition=[AND(=($0, $10), =($7, $9))], joinType=[inner])
>     LogicalTableScan(table=[[CATALOG, SALES, EMP]])
>     LogicalAggregate(group=[{0, 1}])
>       LogicalProject(DEPTNO=[$7], EMPNO0=[$9])
>         LogicalJoin(condition=[=($0, +($9, 1))], joinType=[inner])
>           LogicalTableScan(table=[[CATALOG, SALES, EMP]])
>           LogicalProject(EMPNO=[$0])
>             LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> {code}
> One join would suffice.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to