[
https://issues.apache.org/jira/browse/CALCITE-1494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16035498#comment-16035498
]
Vineet Garg commented on CALCITE-1494:
--------------------------------------
Hi [~julianhyde]
Do you remember why were this optimization done only for equality predicates?
e.g. {{findCorrelationEquivalent}} here [LINK to diff |
https://git1-us-west.apache.org/repos/asf?p=calcite.git;a=blobdiff;f=core/src/main/java/org/apache/calcite/sql2rel/RelDecorrelator.java;h=18871e1500785f244c0a06b06f42f469ae4ad040;hp=0e6bd6ac8a17944495a3dc867b5ac71697c74352;hb=73e437fe;hpb=052f854594f92fd52a142c53998b83d70302075b]
skips if Rex Node is not of {{EQUAL}} type. We end up with value generator
(join with outer query) for queries which have non-equality correlated
predicates e.g. {{select sal from emp where empno IN (select deptno from dept
where emp.job <> dept.name}}. Should it be safe to optimize this and not
generate value generator for non-equality predicates as well?
> Inefficient plan for correlated sub-queries
> -------------------------------------------
>
> Key: CALCITE-1494
> URL: https://issues.apache.org/jira/browse/CALCITE-1494
> Project: Calcite
> Issue Type: Improvement
> Components: core
> Reporter: Vineet Garg
> Assignee: Julian Hyde
> Labels: sub-query
> Fix For: 1.12.0
>
>
> For co-related queries such as
> {noformat} select sal from emp where empno IN (select deptno from dept where
> emp.job = dept.name) {noformat}
> Calcite generates following plan (SubqueryRemove Rule + Decorrelation)
> {noformat}
> LogicalProject(SAL=[$5])
> LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4],
> SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8])
> LogicalJoin(condition=[AND(=($2, $10), =($0, $9))], joinType=[inner])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> LogicalAggregate(group=[{0, 1}])
> LogicalProject(DEPTNO=[$0], JOB=[$1])
> LogicalProject(DEPTNO=[$0], JOB=[$2])
> LogicalJoin(condition=[=($2, $1)], joinType=[inner])
> LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
> LogicalAggregate(group=[{0}])
> LogicalProject(JOB=[$2])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> {noformat}
> As you can notice there is a outer table scan (EMP in this case) to retrieve
> all distinct values for co-related column (EMP.JOB here), which is then
> joined with inner table (DEPT).
> I am not sure why is this step required. After this join Calcite is anyway
> doing group by to generate all distinct values for correlated and result
> column (DEPTNO, JOB) which is then joined with outer table.
> I think the scan + join of outer table with inner table to generate
> co-rrelated values is un-necessary and is not required.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)