[
https://issues.apache.org/jira/browse/CALCITE-2969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16829782#comment-16829782
]
Haisheng Yuan commented on CALCITE-2969:
----------------------------------------
The PR in Github is stuffed with our comments, I will move here.
{quote}
The reason for variablesSet is correlated scalar sub-query in the ON clause.
Similarly Filter needs variablesSet if there is a correlated scalar sub-query
in its condition, and Project needs variablesSet for correlated scalar
sub-query in one of the project expressions.
This stuff isn't very well tested, but it is needed because sub-query can occur
anywhere in a query.
{quote}
If the variableSet is defined as the relations that are referenced out of
current sub-query context, then it makes sense. Almost every operator in the
subquery may reference a column outside of sub-query, thus it needs the
variableSet.
But I checked the code, only Filter actually creates a variableSet and uses it.
Project's constructor doesn't even accept variableSet. Although Join has
variableSet, it is never used, which can cause wrong plan, like the following
query:
{code:sql}
select empno from sales.emp as r left join sales.dept as s
on 1= (select s.deptno from sales.emp e1 left outer join sales.dept d1
on e1.empno+d1.deptno = s.deptno+20)
{code}
Inital plan is:
{code:java}
LogicalProject(EMPNO=[$0])
LogicalJoin(condition=[=(1, $SCALAR_QUERY({
LogicalProject(DEPTNO=[$cor1.DEPTNO0])
LogicalJoin(condition=[=(+($0, $9), +($cor0.DEPTNO0, 20))], joinType=[left])
LogicalTableScan(table=[[CATALOG, SALES, EMP]])
LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
}))], joinType=[left])
LogicalTableScan(table=[[CATALOG, SALES, EMP]])
LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
{code}
The Project and Join inside the sub-query should have non-empty variableSet,
but Project and Join never set its variableSet.
Final plan is obviously wrong:
{code:java}
LogicalProject(EMPNO=[$0])
LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4],
SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8], DEPTNO0=[$9], NAME=[$10])
LogicalJoin(condition=[=(1, $11)], joinType=[left])
LogicalTableScan(table=[[CATALOG, SALES, EMP]])
LogicalJoin(condition=[true], joinType=[left])
LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
LogicalAggregate(group=[{}], agg#0=[SINGLE_VALUE($0)])
LogicalProject(DEPTNO=[$cor1.DEPTNO0])
LogicalJoin(condition=[=(+($0, $9), +($cor0.DEPTNO0, 20))],
joinType=[left])
LogicalTableScan(table=[[CATALOG, SALES, EMP]])
LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
{code}
But anyway, this is off the topic for current issue. I agree we should keep the
variablesSet. Thanks for the clarification.
> Improve design of join-like relational expressions
> --------------------------------------------------
>
> Key: CALCITE-2969
> URL: https://issues.apache.org/jira/browse/CALCITE-2969
> Project: Calcite
> Issue Type: Improvement
> Components: core
> Affects Versions: 1.19.0
> Reporter: Stamatis Zampetakis
> Assignee: Danny Chan
> Priority: Major
> Labels: pull-request-available
> Time Spent: 11h 50m
> Remaining Estimate: 0h
>
> The existing join-like (Join, SemiJoin, Correlate, etc.) logical and physical
> relational expressions have a few design issues which make some parts of the
> codebase complicated and difficult to understand.
> The goal of this ticket is to improve the design of the respective
> expressions based on the discussion in the dev list (see thread [Join,
> SemiJoin,
> Correlate|https://mail-archives.apache.org/mod_mbox/calcite-dev/201903.mbox/%3C8EEA04A0-4A77-4283-BD20-B019E19AE126%40apache.org%3E]).
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)