[ 
https://issues.apache.org/jira/browse/CALCITE-2969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16829782#comment-16829782
 ] 

Haisheng Yuan commented on CALCITE-2969:
----------------------------------------

The PR in Github is stuffed with our comments, I will move here.
{quote}
The reason for variablesSet is correlated scalar sub-query in the ON clause. 
Similarly Filter needs variablesSet if there is a correlated scalar sub-query 
in its condition, and Project needs variablesSet for correlated scalar 
sub-query in one of the project expressions.

This stuff isn't very well tested, but it is needed because sub-query can occur 
anywhere in a query.
{quote}

If the variableSet is defined as the relations that are referenced out of 
current sub-query context, then it makes sense. Almost every operator in the 
subquery may reference a column outside of sub-query, thus it needs the 
variableSet.

But I checked the code, only Filter actually creates a variableSet and uses it. 
Project's constructor doesn't even accept variableSet. Although Join has 
variableSet, it is never used, which can cause wrong plan, like the following 
query:

{code:sql}
select empno from sales.emp as r left join sales.dept as s 
on 1= (select s.deptno from sales.emp e1 left outer join sales.dept d1 
         on e1.empno+d1.deptno = s.deptno+20)
{code}

Inital plan is:
{code:java}
LogicalProject(EMPNO=[$0])
  LogicalJoin(condition=[=(1, $SCALAR_QUERY({
LogicalProject(DEPTNO=[$cor1.DEPTNO0])
  LogicalJoin(condition=[=(+($0, $9), +($cor0.DEPTNO0, 20))], joinType=[left])
    LogicalTableScan(table=[[CATALOG, SALES, EMP]])
    LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
}))], joinType=[left])
    LogicalTableScan(table=[[CATALOG, SALES, EMP]])
    LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
{code}

The Project and Join inside the sub-query should have non-empty variableSet, 
but Project and Join never set its variableSet.

Final plan is obviously wrong:
{code:java}
LogicalProject(EMPNO=[$0])
  LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], 
SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8], DEPTNO0=[$9], NAME=[$10])
    LogicalJoin(condition=[=(1, $11)], joinType=[left])
      LogicalTableScan(table=[[CATALOG, SALES, EMP]])
      LogicalJoin(condition=[true], joinType=[left])
        LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
        LogicalAggregate(group=[{}], agg#0=[SINGLE_VALUE($0)])
          LogicalProject(DEPTNO=[$cor1.DEPTNO0])
            LogicalJoin(condition=[=(+($0, $9), +($cor0.DEPTNO0, 20))], 
joinType=[left])
              LogicalTableScan(table=[[CATALOG, SALES, EMP]])
              LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
{code}

But anyway, this is off the topic for current issue. I agree we should keep the 
variablesSet. Thanks for the clarification.


> Improve design of join-like relational expressions
> --------------------------------------------------
>
>                 Key: CALCITE-2969
>                 URL: https://issues.apache.org/jira/browse/CALCITE-2969
>             Project: Calcite
>          Issue Type: Improvement
>          Components: core
>    Affects Versions: 1.19.0
>            Reporter: Stamatis Zampetakis
>            Assignee: Danny Chan
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 11h 50m
>  Remaining Estimate: 0h
>
> The existing join-like (Join, SemiJoin, Correlate, etc.) logical and physical 
> relational expressions have a few design issues which make some parts of the 
> codebase complicated and difficult to understand.
> The goal of this ticket is to improve the design of the respective 
> expressions based on the discussion in the dev list (see thread [Join, 
> SemiJoin, 
> Correlate|https://mail-archives.apache.org/mod_mbox/calcite-dev/201903.mbox/%3C8EEA04A0-4A77-4283-BD20-B019E19AE126%40apache.org%3E]).
>   



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to