[
https://issues.apache.org/jira/browse/CALCITE-1494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16037300#comment-16037300
]
Julian Hyde commented on CALCITE-1494:
--------------------------------------
When de-correlating a query, you need to generate an un-correlated sub-query
that returns all possible values of the correlating variable as one of its
columns (or a super-set). If the condition is "t1.c1 = 10" then the list of
values is obviously {10}, and if the condition is "t1.c1 = t2.c2" then the list
of values is the same as the values in t2.c2. But if the condition is "t1.c1 >
t2.c2" then the list of possible values is infinite.
Equality isn't the only condition that would work. We could in theory generate
lists of values for "t1.c1 between t2.c2 and t2.c2 + 10" if c1 and c2 have a
discrete domain such as integers. Then we'd only generate 10x too many values.
We could also do the rewrite for "t1.c1 is null" and for low-cardinality
domains such as tinyint, char(1), and boolean.
But equality is the only case that comes up regularly.
> Inefficient plan for correlated sub-queries
> -------------------------------------------
>
> Key: CALCITE-1494
> URL: https://issues.apache.org/jira/browse/CALCITE-1494
> Project: Calcite
> Issue Type: Improvement
> Components: core
> Reporter: Vineet Garg
> Assignee: Julian Hyde
> Labels: sub-query
> Fix For: 1.12.0
>
>
> For co-related queries such as
> {noformat} select sal from emp where empno IN (select deptno from dept where
> emp.job = dept.name) {noformat}
> Calcite generates following plan (SubqueryRemove Rule + Decorrelation)
> {noformat}
> LogicalProject(SAL=[$5])
> LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4],
> SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8])
> LogicalJoin(condition=[AND(=($2, $10), =($0, $9))], joinType=[inner])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> LogicalAggregate(group=[{0, 1}])
> LogicalProject(DEPTNO=[$0], JOB=[$1])
> LogicalProject(DEPTNO=[$0], JOB=[$2])
> LogicalJoin(condition=[=($2, $1)], joinType=[inner])
> LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
> LogicalAggregate(group=[{0}])
> LogicalProject(JOB=[$2])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> {noformat}
> As you can notice there is a outer table scan (EMP in this case) to retrieve
> all distinct values for co-related column (EMP.JOB here), which is then
> joined with inner table (DEPT).
> I am not sure why is this step required. After this join Calcite is anyway
> doing group by to generate all distinct values for correlated and result
> column (DEPTNO, JOB) which is then joined with outer table.
> I think the scan + join of outer table with inner table to generate
> co-rrelated values is un-necessary and is not required.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)