[
https://issues.apache.org/jira/browse/CALCITE-5213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17876837#comment-17876837
]
Ian Bertolacci edited comment on CALCITE-5213 at 8/26/24 9:13 PM:
------------------------------------------------------------------
[~libenchao] I now realize the importance of CALCITE-5127. I spent some time
stripping out all of our junk to use a "unadulterated" Calcite RelBuilder, and
I was still able to replicate the issue.
I realized then that I should be using the project builders which take the
correlation variable. After doing that I get the correct tree
Initial tree: (see variables set)
{code}
8:Project(variablesSet=[[$cor0]], $f0=[$SCALAR_QUERY(
│ 7:Aggregate(group=[{}], agg#0=[SUM($0)])
│ └── 6:Project(variablesSet=[[$cor1]], $f0=[+($1, $SCALAR_QUERY(
│ │ 5:Aggregate(group=[{}], agg#0=[SUM($1)])
│ │ └── 4:Filter(condition=[=($0, $cor1.T2_foreignKey)])
│ │ └── 3:QueryTableScan(table=[T1], Schema=[ID:Dimension,
T1_measureColumn:Decimal(0, 38), T1_foreignKey:Dimension])
│ │))])
│ └── 2:Filter(condition=[=($0, $cor0.T3_foreignKey)])
│ └── 1:QueryTableScan(table=[T2], Schema=[ID:Dimension,
T2_measureColumn:Decimal(0, 38), T2_foreignKey:Dimension])
│)])
└── 0:QueryTableScan(table=[T3], Schema=[ID:Dimension,
T3_measureColumn:Decimal(0, 38), T3_foreignKey:Dimension])
{code}
After planning only on root
{code}
23:Project($f0=[$3])
└── 21:Correlate(correlation=[$cor0], joinType=[left], requiredColumns=[{2}])
├── 0:QueryTableScan(table=[T3], Schema=[ID:Dimension,
T3_measureColumn:Decimal(0, 38), T3_foreignKey:Dimension])
└── 19:Aggregate(group=[{}], agg#0=[SUM($0)])
└── 34:Project($f0=[+($1, $3)])
└── 32:Correlate(correlation=[$cor1], joinType=[left],
requiredColumns=[{2}])
├── 15:Filter(condition=[=($0, $cor0.T3_foreignKey)])
│ └── 1:QueryTableScan(table=[T2], Schema=[ID:Dimension,
T2_measureColumn:Decimal(0, 38), T2_foreignKey:Dimension])
└── 30:Aggregate(group=[{}], agg#0=[SUM($1)])
└── 28:Filter(condition=[=($0, $cor1.T2_foreignKey)])
└── 3:QueryTableScan(table=[T1], Schema=[ID:Dimension,
T1_measureColumn:Decimal(0, 38), T1_foreignKey:Dimension])
{code}
So that's good.
I need to double check but I think that with the original sql I was not getting
projection nodes with set correlation variables.
was (Author: ian.bertolacci):
[~libenchao] I now realize the importance of CALCITE-5127. I spent some time
stripping out all of our junk to use a "unadulterated" Calcite RelBuilder, and
I was still able to replicate the issue.
I realized then that I should be using the project builders which take the
correlation variable. After doing that I get the correct tree:
{code}
23:Project($f0=[$3])
└── 21:Correlate(correlation=[$cor0], joinType=[left], requiredColumns=[{2}])
├── 0:QueryTableScan(table=[T3], Schema=[ID:Dimension,
T3_measureColumn:Decimal(0, 38), T3_foreignKey:Dimension])
└── 19:Aggregate(group=[{}], agg#0=[SUM($0)])
└── 34:Project($f0=[+($1, $3)])
└── 32:Correlate(correlation=[$cor1], joinType=[left],
requiredColumns=[{2}])
├── 15:Filter(condition=[=($0, $cor0.T3_foreignKey)])
│ └── 1:QueryTableScan(table=[T2], Schema=[ID:Dimension,
T2_measureColumn:Decimal(0, 38), T2_foreignKey:Dimension])
└── 30:Aggregate(group=[{}], agg#0=[SUM($1)])
└── 28:Filter(condition=[=($0, $cor1.T2_foreignKey)])
└── 3:QueryTableScan(table=[T1], Schema=[ID:Dimension,
T1_measureColumn:Decimal(0, 38), T1_foreignKey:Dimension])
{code}
So that's good.
I need to double check but I think that with the original sql I was not getting
projection nodes with set correlation variables.
> PROJECT_TO_SUBQUERY producing Incorrect tree from nested correlated
> subqueries in projections with correlations in filters.
> ---------------------------------------------------------------------------------------------------------------------------
>
> Key: CALCITE-5213
> URL: https://issues.apache.org/jira/browse/CALCITE-5213
> Project: Calcite
> Issue Type: Bug
> Affects Versions: 1.30.0, 1.36.0, 1.37.0
> Reporter: Ian Bertolacci
> Priority: Major
>
> CoreRules.PROJECT_SUB_QUERY_TO_CORRELATE produces (what I believe to be)
> incorrect trees from nested correlated subqueries in projections.
> I'm hoping that I'm just doing something wrong and maybe someone will point
> it out.
> For example:
> {code:sql}
> SELECT (SELECT Sum(C202
> + (SELECT Sum(C101)
> FROM T1
> WHERE T1.ID = T2.C201))
> FROM T2
> WHERE T2.ID = T3.C302)
> FROM T3 {code}
> The initial RelNode tree produced from this SQL is:
> {code}
> 232:LogicalProject(EXPR$0=[
> | $SCALAR_QUERY({
> | LogicalAggregate(group=[{}], EXPR$0=[SUM($0)])
> | └──LogicalProject($f0=[+($2,
> | | $SCALAR_QUERY({
> | | LogicalAggregate(group=[{}], EXPR$0=[SUM($0)])
> | | └──LogicalProject(C101=[$1])
> | | └──LogicalFilter(condition=[=($0, $cor1.C201)])
> | | └──TableScan(table=[[QUERY, T1]], fields=[[ID, C101]])
> | | }))])
> | └──LogicalFilter(condition=[=($0, $cor0.C302)])
> | └──TableScan(table=[[QUERY, T2]], fields=[[ID, C201, C202, C203,
> C204]])
> | })])
> └──223:TableScan(table=[[QUERY, T3]], fields=[[ID, C301, C302]])
> {code}
> This looks ok so far, but it is important to notice the lack of variableSets
> in the projection nodes (which would appear in the filter nodes having
> correlated subqueries in their conditions).
> After applying the CoreRules.PROJECT_SUB_QUERY_TO_CORRELATE rule via a HEP
> program the resulting tree is:
> {code}
> 270:LogicalProject(EXPR$0=[$3])
> | // Unexpected Join instead of correlate binding $cor0
> └──268:LogicalJoin(condition=[true], joinType=[left])
> ├──246:TableScan(table=[[QUERY, T3]], fields=[[ID, C301, C302]])
> └──266:LogicalAggregate(group=[{}], EXPR$0=[SUM($0)])
> └──283:LogicalProject($f0=[+($2, $5)])
> | // Correlate node correctly binding $cor1
> └──281:LogicalCorrelate(correlation=[$cor1], joinType=[left],
> requiredColumns=[{1}])
> | // $cor0 is not bound by any parent correlate node
> ├──262:LogicalFilter(condition=[=($0, $cor0.C302)])
> | └──247:TableScan(table=[[QUERY, T2]], fields=[[ID, C201, C202,
> C203, C204]])
> └──279:LogicalAggregate(group=[{}], EXPR$0=[SUM($0)])
> └──277:LogicalProject(C101=[$1])
> | // $cor1 bound by #281
> └──275:LogicalFilter(condition=[=($0, $cor1.C201)])
> └──249:TableScan(table=[[QUERY, T1]], fields=[[ID,
> C101]])
> {code}
>
> Essentially, instead of a correlate node binding $cor0 there is a join (#268)
> and there is nothing binding $cor0.
> I would have expected this:
> {code}
> 270:LogicalProject(EXPR$0=[$3])
> | // Correlate binding $cor0 and requiring C302 from the LHS (#246)
> └──299:LogicalCorrelate(correlation=[$cor0], joinType=[left],
> requiredColumns=[{2}])
> ├──246:TableScan(table=[[QUERY, T3]], fields=[[ID, C301, C302]])
> └──266:LogicalAggregate(group=[{}], EXPR$0=[SUM($0)])
> └──283:LogicalProject($f0=[+($2, $5)])
> | // Correlate node correctly binding $cor1
> └──281:LogicalCorrelate(correlation=[$cor1], joinType=[left],
> requiredColumns=[{1}])
> | // $cor0 bound by #299
> ├──262:LogicalFilter(condition=[=($0, $cor0.C302)])
> | └──247:TableScan(table=[[QUERY, T2]], fields=[[ID, C201, C202,
> C203, C204]])
> └──279:LogicalAggregate(group=[{}], EXPR$0=[SUM($0)])
> └──277:LogicalProject(C101=[$1])
> | // $cor1 bound by #281
> └──275:LogicalFilter(condition=[=($0, $cor1.C201)])
> └──249:TableScan(table=[[QUERY, T1]], fields=[[ID,
> C101]])
> {code}
> Further, when adding CoreRules.JOIN_TO_CORRELATE in an attempt to convert the
> erroneous join into a correlation we get:
> {code}
> 322:LogicalProject(EXPR$0=[$3])
> | // Correlate incorrectly binding $cor2 (which does not appear anywhere)
> | // when it should be binding $cor0, and requiring no columns from the left
> side
> └──324:LogicalCorrelate(correlation=[$cor2], joinType=[left],
> requiredColumns=[{}])
> ├──298:TableScan(table=[[QUERY, T3]], fields=[[ID, C301, C302]])
> └──318:LogicalAggregate(group=[{}], EXPR$0=[SUM($0)])
> └──337:LogicalProject($f0=[+($2, $5)])
> | // Correlate node correctly binding $cor1
> └──335:LogicalCorrelate(correlation=[$cor1], joinType=[left],
> requiredColumns=[{1}])
> | // $cor0 is not bound by any parent correlate node
> ├──314:LogicalFilter(condition=[=($0, $cor0.C302)])
> | └──299:TableScan(table=[[QUERY, T2]], fields=[[ID, C201, C202,
> C203, C204]])
> └──333:LogicalAggregate(group=[{}], EXPR$0=[SUM($0)])
> └──331:LogicalProject(C101=[$1])
> | // $cor1 bound by #335
> └──329:LogicalFilter(condition=[=($0, $cor1.C201)])
> └──301:TableScan(table=[[QUERY, T1]], fields=[[ID,
> C101]])
> {code}
> which does replace the join with a correlate, but the correlate is incorrect;
> binding an undefined correlation variable and requiring no columns.
> Even further, (and this might be a separate issue all together), applying
> RelDecorrelator as an additional program in the sequence produces a very
> incorrect tree (both with and without the JOIN_TO_CORRELATE rule):
> {code}
> 1041:LogicalProject(EXPR$0=[$5], ID6=[$4])
> └──1039:LogicalJoin(condition=[true], joinType=[left])
> ├──1006:TableScan(table=[[QUERY, T3]], fields=[[ID, C301, C302, C303]])
> └──1037:LogicalAggregate(group=[{0}], EXPR$0=[SUM($1)])
> └──1035:LogicalProject(ID6=[$0], $f0=[+($2, $6)])
> └──1033:LogicalJoin(condition=[=($1, $5)], joinType=[left])
> ├──1024:LogicalFilter(condition=[=($0, $0)])
> | └──1007:TableScan(table=[[QUERY, T2]],
> | fields=[[ID, C201, C202, C203, C204]])
> └──1031:LogicalAggregate(group=[{0}], EXPR$0=[SUM($1)])
> └──1029:LogicalProject(ID=[$0], C101=[$1])
> └──1027:LogicalFilter(condition=[=($0, $0)])
> └──1009:TableScan(table=[[QUERY, T1]], fields=[[ID,
> C101]])
> {code}
> This tree now has more projection expressions than the original query, which
> is fully incorrect.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)