[ 
https://issues.apache.org/jira/browse/CALCITE-7405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18054997#comment-18054997
 ] 

Ian Bertolacci commented on CALCITE-7405:
-----------------------------------------

I guess really the issue is that SqlToRelConverter builds in two steps:
1. It creates the projection with all projection expressions (including 
correlated subqueries) with no correlation ids.
2. It checks the expressions for correlation variables, and re-creates the 
projection with the correlation id (singular).

but at this point, its too late, the RelBuilder.project logic has eliminated 
the input projection, so the reference is invalid.
[RelBuilder has explicit logic to handle the situation where the correlation 
variables are 
defined|https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/tools/RelBuilder.java#L2129-L2133]

> SqlToRelConverter creates invalid plan when converting SQL with correlated 
> subquery and RelBuilder engages bloat optimizations
> ------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CALCITE-7405
>                 URL: https://issues.apache.org/jira/browse/CALCITE-7405
>             Project: Calcite
>          Issue Type: Bug
>    Affects Versions: 1.38.0, 1.39.0, 1.40.0, 1.41.0
>            Reporter: Ian Bertolacci
>            Assignee: Ian Bertolacci
>            Priority: Blocker
>              Labels: pull-request-available
>
> In 
> [SqlToRelConverter.convertNonAggregateSelectList|https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/sql2rel/SqlToRelConverter.java#L5021],
>  the converter will build the current projection, but then use the 
> BlackBoard's root node as the input and rebuilds a Project node using the 
> previously built Projects expressions.
> The issue is that the initial Project may be the result of Project-Project 
> fusion (because of bloat optimization) causing the resulting Project's 
> RexInputRefs to be adjusted, and so using those expressions on top of 
> BlackBoard's root node (which was eliminated by the fusion) is invalid, as 
> the references are mis-aligned.
> A trivial example:
> {code:sql}
> select C0_12 + (select sum(C0_32) from T3 where T2.id = T3.C0_31)
> from T2 join T1 on T2.C0_21 + 1 = T1.id
> {code}
> During the conversion, we get to RelBuilder.project_ 
> ([L2150|https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/tools/RelBuilder.java#L2150])
> {code:java}
> /* This project elides the compound join expression */
> 7:LogicalProject(ID=[$0:ID], C0_21=[$1:C0_21], C0_22=[$2:C0_22], 
> ID0=[$4:ID0], C0_12=[$5:C0_12], C0_13=[$6:C0_13], C0_14=[$7:C0_14])
> └── 6:LogicalJoin(condition=[=($3:LHS.$f3, $4:T1.ID)], joinType=[inner])
>     ├── 5:LogicalProject(ID=[$0:T2.ID], C0_21=[$1:T2.C0_21], 
> C0_22=[$2:T2.C0_22], $f3=[+($1:T2.C0_21, 1)])
>     │   └── 1:QueryTableScan(table=[T2], Schema=[ID:Dimension, 
> C0_21:Dimension, C0_22:Decimal(3, 3)])
>     └── 3:QueryTableScan(table=[T1], Schema=[ID:Dimension, C0_12:Decimal(3, 
> 3), C0_13:Text, C0_14:Timestamp])
> {code}
> and is holding this projection expression:
> {code:java}
> +($4:C0_12, $SCALAR_QUERY({ ... }))
> {code}
> Then it engages RelOptUtil.pushPastProjectUnlessBloat, which detects that it 
> can fuse the projections of the working projections and the input 
> Projection's projections and produces, which shifts $4 to $5, and the result 
> is this Project tree:
> {code:java}
> 13:LogicalProject(EXPR$0=[+($5:C0_12 /*shifted, but references valid field 
> */, $SCALAR_QUERY({...}))])
> │ /* Join expression eliding project eliminated */
> └── 6:LogicalJoin(condition=[=($3:LHS.$f3, $4:T1.ID)], joinType=[inner])
>     ├── 5:LogicalProject(ID=[$0:T2.ID], C0_21=[$1:T2.C0_21], 
> C0_22=[$2:T2.C0_22], $f3=[+($1:T2.C0_21, 1)])
>     │   └── 1:QueryTableScan(table=[T2], Schema=[ID:Dimension, 
> C0_21:Dimension, C0_22:Decimal(3, 3)])
>     └── 3:QueryTableScan(table=[T1], Schema=[ID:Dimension, C0_12:Decimal(3, 
> 3), C0_13:Text, C0_14:Timestamp])
> {code}
> This is still a valid tree
> This is then propogated up to SqlToRelConverter.convertNonAggregateSelectList 
> ([L5012|http://example.com|https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/sql2rel/SqlToRelConverter.java#L5012]),
>  and it then detects that there is a correlation, and that it should add the 
> correlation variable to that Project. To do this it will call 
> RelBuilder.projectNamed. Since the node was already built, it needs to push 
> some node onto the RelBuilder stack. [It uses the Blackboard's 
> root|https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/sql2rel/SqlToRelConverter.java#L5021],
>  which happens to be the Project which was eliminated by the fusion. When it 
> then uses the working Project's expressions, they now reference the incorrect 
> position:
> {code:java}
> 18:LogicalProject(variablesSet=[[$cor0]], EXPR$0=[+($5:C0_13 /* reference now 
> points to a different field */, $SCALAR_QUERY({...})])
> │ /* Join expression eliding project re-introduced */
> └── 17:LogicalProject(ID=[$0:ID], C0_21=[$1:C0_21], C0_22=[$2:C0_22], 
> ID0=[$4:ID0], C0_12=[$5:C0_12], C0_13=[$6:C0_13], C0_14=[$7:C0_14])
>     └── 16:LogicalJoin(condition=[=($3:LHS.$f3, $4:T1.ID)], joinType=[inner])
>         ├── 15:LogicalProject(ID=[$0:T2.ID], C0_21=[$1:T2.C0_21], 
> C0_22=[$2:T2.C0_22], $f3=[+($1:T2.C0_21, 1)])
>         │   └── 1:QueryTableScan(table=[T2], Schema=[ID:Dimension, 
> C0_21:Dimension, C0_22:Decimal(3, 3)])
>         └── 3:QueryTableScan(table=[T1], Schema=[ID:Dimension, 
> C0_12:Decimal(3, 3), C0_13:Text, C0_14:Timestamp])
> {code}
> In this case, there are other layers (in our runtime, at least) which will 
> fail in type-checking because this type is incorrect for addition, but it is 
> completely possible that the invalid reference has the correct type, and no 
> issue is detected.
> My belief is that the the converter should push the working Project's input 
> onto the stack:
> {code:java}
> a/core/src/main/java/org/apache/calcite/sql2rel/SqlToRelConverter.java 
> b/core/src/main/java/org/apache/calcite/sql2rel/SqlToRelConverter.java
> index 8fc83104c..6cc74eef5 100644
> --- a/core/src/main/java/org/apache/calcite/sql2rel/SqlToRelConverter.java
> +++ b/core/src/main/java/org/apache/calcite/sql2rel/SqlToRelConverter.java
> @@ -5018,7 +5018,7 @@ private void convertNonAggregateSelectList(
>        // correlation variables have been normalized in p.r, we should use 
> expressions
>        // in p.r instead of the original exprs
>        Project project1 = (Project) p.r;
> -      r = relBuilder.push(bb.root())
> +      r = relBuilder.push(project1.getInput())
>            .projectNamed(project1.getProjects(), uniqueFieldNames, true,
>                ImmutableSet.of(p.id))
>            .build();
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to