[jira] [Commented] (CALCITE-1494) Inefficient plan for correlated sub-queries

Ashutosh Chauhan (JIRA) Tue, 06 Jun 2017 08:26:35 -0700

    [ 
https://issues.apache.org/jira/browse/CALCITE-1494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16039079#comment-16039079
 ]


Ashutosh Chauhan commented on CALCITE-1494:
-------------------------------------------

Per this classic ; 
https://pdfs.semanticscholar.org/650f/dd065b407e0a4fdd854b1861e58eed289955.pdf 
question [~vgarg] is raising does seem fair. Why do we need 'value generator' ? 
If you look at query 3 on page 3 of the paper it has non-equi join predicate on 
correlated variable. Rewrite on page 4 is doing semijoin between inner and 
outer query with non-equi join predicate to get all qualifying values and then 
doing join of that with outer query. It does seem like generating 'all possible 
values of the correlating variable as one of its columns (or a super-set)' is 
wasteful.

> Inefficient plan for correlated sub-queries
> -------------------------------------------
>
>                 Key: CALCITE-1494
>                 URL: https://issues.apache.org/jira/browse/CALCITE-1494
>             Project: Calcite
>          Issue Type: Improvement
>          Components: core
>            Reporter: Vineet Garg
>            Assignee: Julian Hyde
>              Labels: sub-query
>             Fix For: 1.12.0
>
>
> For co-related queries such as 
> {noformat} select sal from emp where empno IN (select deptno from dept where 
> emp.job = dept.name) {noformat}
> Calcite generates following plan (SubqueryRemove Rule + Decorrelation) 
> {noformat}
> LogicalProject(SAL=[$5])
>   LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], 
> SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8])
>     LogicalJoin(condition=[AND(=($2, $10), =($0, $9))], joinType=[inner])
>       LogicalTableScan(table=[[CATALOG, SALES, EMP]])
>       LogicalAggregate(group=[{0, 1}])
>         LogicalProject(DEPTNO=[$0], JOB=[$1])
>           LogicalProject(DEPTNO=[$0], JOB=[$2])
>             LogicalJoin(condition=[=($2, $1)], joinType=[inner])
>               LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
>               LogicalAggregate(group=[{0}])
>                 LogicalProject(JOB=[$2])
>                   LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> {noformat}
> As you can notice there is a outer table scan (EMP in this case) to retrieve 
> all distinct values for co-related column (EMP.JOB here), which is then 
> joined with inner table (DEPT). 
> I am not sure why is this step required. After this join Calcite is anyway 
> doing group by to generate all distinct values for correlated and result 
> column (DEPTNO, JOB) which is then joined with outer table. 
> I think the scan + join of outer table with inner table to generate 
> co-rrelated values is un-necessary and is not required.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (CALCITE-1494) Inefficient plan for correlated sub-queries

Reply via email to