[jira] [Commented] (CALCITE-1494) Inefficient plan for correlated sub-queries

Julian Hyde (JIRA) Mon, 05 Jun 2017 11:07:25 -0700

    [ 
https://issues.apache.org/jira/browse/CALCITE-1494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16037300#comment-16037300
 ]


Julian Hyde commented on CALCITE-1494:
--------------------------------------

When de-correlating a query, you need to generate an un-correlated sub-query 
that returns all possible values of the correlating variable as one of its 
columns (or a super-set). If the condition is "t1.c1 = 10" then the list of 
values is obviously {10}, and if the condition is "t1.c1 = t2.c2" then the list 
of values is the same as the values in t2.c2. But if the condition is "t1.c1 > 
t2.c2" then the list of possible values is infinite.

Equality isn't the only condition that would work. We could in theory generate 
lists of values for "t1.c1 between t2.c2 and t2.c2 + 10" if c1 and c2 have a 
discrete domain such as integers. Then we'd only generate 10x too many values.

We could also do the rewrite for "t1.c1 is null" and for low-cardinality 
domains such as tinyint, char(1), and boolean.

But equality is the only case that comes up regularly.

> Inefficient plan for correlated sub-queries
> -------------------------------------------
>
>                 Key: CALCITE-1494
>                 URL: https://issues.apache.org/jira/browse/CALCITE-1494
>             Project: Calcite
>          Issue Type: Improvement
>          Components: core
>            Reporter: Vineet Garg
>            Assignee: Julian Hyde
>              Labels: sub-query
>             Fix For: 1.12.0
>
>
> For co-related queries such as 
> {noformat} select sal from emp where empno IN (select deptno from dept where 
> emp.job = dept.name) {noformat}
> Calcite generates following plan (SubqueryRemove Rule + Decorrelation) 
> {noformat}
> LogicalProject(SAL=[$5])
>   LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], 
> SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8])
>     LogicalJoin(condition=[AND(=($2, $10), =($0, $9))], joinType=[inner])
>       LogicalTableScan(table=[[CATALOG, SALES, EMP]])
>       LogicalAggregate(group=[{0, 1}])
>         LogicalProject(DEPTNO=[$0], JOB=[$1])
>           LogicalProject(DEPTNO=[$0], JOB=[$2])
>             LogicalJoin(condition=[=($2, $1)], joinType=[inner])
>               LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
>               LogicalAggregate(group=[{0}])
>                 LogicalProject(JOB=[$2])
>                   LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> {noformat}
> As you can notice there is a outer table scan (EMP in this case) to retrieve 
> all distinct values for co-related column (EMP.JOB here), which is then 
> joined with inner table (DEPT). 
> I am not sure why is this step required. After this join Calcite is anyway 
> doing group by to generate all distinct values for correlated and result 
> column (DEPTNO, JOB) which is then joined with outer table. 
> I think the scan + join of outer table with inner table to generate 
> co-rrelated values is un-necessary and is not required.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (CALCITE-1494) Inefficient plan for correlated sub-queries

Reply via email to