[jira] [Commented] (CALCITE-1494) Inefficient plan for correlated sub-queries

Vineet Garg (JIRA) Fri, 02 Jun 2017 15:00:46 -0700

    [ 
https://issues.apache.org/jira/browse/CALCITE-1494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16035498#comment-16035498
 ]


Vineet Garg commented on CALCITE-1494:
--------------------------------------

Hi [~julianhyde]
Do you remember why were this optimization done only for equality predicates? 
e.g. {{findCorrelationEquivalent}} here [LINK to diff | 
https://git1-us-west.apache.org/repos/asf?p=calcite.git;a=blobdiff;f=core/src/main/java/org/apache/calcite/sql2rel/RelDecorrelator.java;h=18871e1500785f244c0a06b06f42f469ae4ad040;hp=0e6bd6ac8a17944495a3dc867b5ac71697c74352;hb=73e437fe;hpb=052f854594f92fd52a142c53998b83d70302075b]
 skips if Rex Node is not of {{EQUAL}} type. We end up with value generator 
(join with outer query) for queries which have non-equality correlated 
predicates e.g. {{select sal from emp where empno IN (select deptno from dept 
where emp.job <> dept.name}}. Should it be safe to optimize this and not 
generate value generator for non-equality predicates as well?

> Inefficient plan for correlated sub-queries
> -------------------------------------------
>
>                 Key: CALCITE-1494
>                 URL: https://issues.apache.org/jira/browse/CALCITE-1494
>             Project: Calcite
>          Issue Type: Improvement
>          Components: core
>            Reporter: Vineet Garg
>            Assignee: Julian Hyde
>              Labels: sub-query
>             Fix For: 1.12.0
>
>
> For co-related queries such as 
> {noformat} select sal from emp where empno IN (select deptno from dept where 
> emp.job = dept.name) {noformat}
> Calcite generates following plan (SubqueryRemove Rule + Decorrelation) 
> {noformat}
> LogicalProject(SAL=[$5])
>   LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], 
> SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8])
>     LogicalJoin(condition=[AND(=($2, $10), =($0, $9))], joinType=[inner])
>       LogicalTableScan(table=[[CATALOG, SALES, EMP]])
>       LogicalAggregate(group=[{0, 1}])
>         LogicalProject(DEPTNO=[$0], JOB=[$1])
>           LogicalProject(DEPTNO=[$0], JOB=[$2])
>             LogicalJoin(condition=[=($2, $1)], joinType=[inner])
>               LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
>               LogicalAggregate(group=[{0}])
>                 LogicalProject(JOB=[$2])
>                   LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> {noformat}
> As you can notice there is a outer table scan (EMP in this case) to retrieve 
> all distinct values for co-related column (EMP.JOB here), which is then 
> joined with inner table (DEPT). 
> I am not sure why is this step required. After this join Calcite is anyway 
> doing group by to generate all distinct values for correlated and result 
> column (DEPTNO, JOB) which is then joined with outer table. 
> I think the scan + join of outer table with inner table to generate 
> co-rrelated values is un-necessary and is not required.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (CALCITE-1494) Inefficient plan for correlated sub-queries

Reply via email to