[jira] [Commented] (CALCITE-1494) Inefficient plan for correlated sub-queries

Vineet Garg (JIRA) Mon, 05 Jun 2017 11:44:29 -0700

    [ 
https://issues.apache.org/jira/browse/CALCITE-1494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16037367#comment-16037367
 ]


Vineet Garg commented on CALCITE-1494:
--------------------------------------

Thanks for your response [~julianhyde]. 

I am trying to understand why do we even need value generator? Currently for 
queries like {{select sal from emp where empno IN (select deptno from dept 
where emp.job = dept.name}} we skip value generator and instead do group by on 
subquery expression and correlated columns to generate possible values. 
Wouldn't doing so for all of the examples you provided above be semantically 
correct?

Could you please provide an example query where skipping value generator will 
not be semantically correct thing to do?

> Inefficient plan for correlated sub-queries
> -------------------------------------------
>
>                 Key: CALCITE-1494
>                 URL: https://issues.apache.org/jira/browse/CALCITE-1494
>             Project: Calcite
>          Issue Type: Improvement
>          Components: core
>            Reporter: Vineet Garg
>            Assignee: Julian Hyde
>              Labels: sub-query
>             Fix For: 1.12.0
>
>
> For co-related queries such as 
> {noformat} select sal from emp where empno IN (select deptno from dept where 
> emp.job = dept.name) {noformat}
> Calcite generates following plan (SubqueryRemove Rule + Decorrelation) 
> {noformat}
> LogicalProject(SAL=[$5])
>   LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], 
> SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8])
>     LogicalJoin(condition=[AND(=($2, $10), =($0, $9))], joinType=[inner])
>       LogicalTableScan(table=[[CATALOG, SALES, EMP]])
>       LogicalAggregate(group=[{0, 1}])
>         LogicalProject(DEPTNO=[$0], JOB=[$1])
>           LogicalProject(DEPTNO=[$0], JOB=[$2])
>             LogicalJoin(condition=[=($2, $1)], joinType=[inner])
>               LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
>               LogicalAggregate(group=[{0}])
>                 LogicalProject(JOB=[$2])
>                   LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> {noformat}
> As you can notice there is a outer table scan (EMP in this case) to retrieve 
> all distinct values for co-related column (EMP.JOB here), which is then 
> joined with inner table (DEPT). 
> I am not sure why is this step required. After this join Calcite is anyway 
> doing group by to generate all distinct values for correlated and result 
> column (DEPTNO, JOB) which is then joined with outer table. 
> I think the scan + join of outer table with inner table to generate 
> co-rrelated values is un-necessary and is not required.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (CALCITE-1494) Inefficient plan for correlated sub-queries

Reply via email to