[
https://issues.apache.org/jira/browse/CALCITE-6504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated CALCITE-6504:
------------------------------------
Labels: pull-request-available (was: )
> JOIN_SUB_QUERY_TO_CORRELATE/Join SubQueryRemoveRule produces incorrect tree
> when using correlated sub-query in on clause of equi-join
> -------------------------------------------------------------------------------------------------------------------------------------
>
> Key: CALCITE-6504
> URL: https://issues.apache.org/jira/browse/CALCITE-6504
> Project: Calcite
> Issue Type: Bug
> Reporter: Ian Bertolacci
> Priority: Major
> Labels: pull-request-available
>
> JOIN_SUB_QUERY_TO_CORRELATE/Join SubQueryRemoveRule produces incorrect trees
> when using a correlated sub-query as part of an equi-join condition of a join.
> For example:
> {code:sql}
> select * from T3
> join T2 on T2.C201 = (
> select max(id)
> from T1
> where T3.C301 = T1.id)
> {code}
> Has the initial tree:
> {code:none}
> 20:Project(ID=[$0], C301=[$1], C302=[$2], ID0=[$3], C201=[$4], C202=[$5],
> C203=[$6], C204=[$7])
> └── 19:Join(condition=[=(SCALAR_SUBQUERY{
> │ 18:Aggregate(group=[{}], EXPR$0=[MAX($0)])
> │ └── 17:Project(ID=[$0])
> │ └── 16:Filter(condition=[=($cor0.C301, $0)])
> │ └── 15:TableScan(table=[T1], Schema=[ID:Dimension,
> C101:Decimal(0)])
> │}, $4)], joinType=[inner])
> ├── 13:TableScan(table=[T3], Schema=[ID:Dimension, C301:Dimension,
> C302:Dimension])
> └── 14:TableScan(table=[T2], Schema=[ID:Dimension, C201:Dimension,
> C202:Decimal(0), C203:Decimal(0), C204:Decimal(2)])
> {code}
> Using only the JOIN_SUB_QUERY_TO_CORRELATE rules in a hep program in
> planning, this is the resulting tree:
> {code:java}
> 25:Project(ID=[$0], C301=[$1], C302=[$2], ID0=[$3], C201=[$4], C202=[$5],
> C203=[$6], C204=[$7])
> └── 41:Project(ID=[$0], C301=[$1], C302=[$2], ID0=[$3], C201=[$4], C202=[$5],
> C203=[$6], C204=[$7])
> └── 39:Join(condition=[=($8, $4)], joinType=[inner])
> ├── 13:TableScan(table=[T3], Schema=[ID:Dimension, C301:Dimension,
> C302:Dimension])
> └── 37:Correlate(correlation=[$cor0], joinType=[left],
> requiredColumns=[{1}])
> ├── 14:TableScan(table=[T2], Schema=[ID:Dimension,
> C201:Dimension, C202:Decimal(0), C203:Decimal(0), C204:Decimal(2)])
> └── 35:Aggregate(group=[{}], EXPR$0=[MAX($0)])
> └── 33:Project(ID=[$0])
> └── 31:Filter(condition=[=($cor0.C301, $0)])
> └── 15:TableScan(table=[T1], Schema=[ID:Dimension,
> C101:Decimal(0)])
> {code}
> Notice that the original correlation expression is between T1 and T3, but the
> rule has created a correlate between T1 and T2.
> I would have expected this tree:
> {code:none}
> 25:Project(ID=[$0], C301=[$1], C302=[$2], ID0=[$3], C201=[$4], C202=[$5],
> C203=[$6], C204=[$7])
> └── 41:Project(ID=[$0], C301=[$1], C302=[$2], ID0=[$3], C201=[$4], C202=[$5],
> C203=[$6], C204=[$7])
> └── 39:Join(condition=[=($3, $4)], joinType=[inner])
> ├── 37:Correlate(correlation=[$cor0], joinType=[left],
> requiredColumns=[{1}])
> │ ├── 13:TableScan(table=[T3], Schema=[ID:Dimension,
> C301:Dimension, C302:Dimension])
> │ └── 35:Aggregate(group=[{}], EXPR$0=[MAX($0)])
> │ └── 33:Project(ID=[$0])
> │ └── 31:Filter(condition=[=($cor0.C301, $0)])
> │ └── 15:TableScan(table=[T1], Schema=[ID:Dimension,
> C101:Decimal(0)])
> └── 14:TableScan(table=[T2], Schema=[ID:Dimension, C201:Dimension,
> C202:Decimal(0), C203:Decimal(0), C204:Decimal(2)])
> {code}
>
> Additionally if you swap the sides of the join (making {{{}`T2 join T3`{}}}),
> you do get the correct association, but the correlate has an invalid
> `requiredColumn`:
> {code:none}
> 25:Project(ID=[$0], C201=[$1], C202=[$2], C203=[$3], C204=[$4], ID0=[$5],
> C301=[$6], C302=[$7])
> └── 41:Project(ID=[$0], C201=[$1], C202=[$2], C203=[$3], C204=[$4], ID0=[$5],
> C301=[$6], C302=[$7])
> └── 39:Join(condition=[=($8, $1)], joinType=[inner])
> ├── 13:TableScan(table=[T2], Schema=[ID:Dimension, C201:Dimension,
> C202:Decimal(0), C203:Decimal(0), C204:Decimal(2)])
> └── 37:Correlate(correlation=[$cor0], joinType=[left],
> requiredColumns=[{6}])
> ├── 14:TableScan(table=[T3], Schema=[ID:Dimension,
> C301:Dimension, C302:Dimension])
> └── 35:Aggregate(group=[{}], EXPR$0=[MAX($0)])
> └── 33:Project(ID=[$0])
> └── 31:Filter(condition=[=($cor0.C301, $0)])
> └── 15:TableScan(table=[T1], Schema=[ID:Dimension,
> C101:Decimal(0)])
> {code}
> Here `requiredColumn` should be 1 (pointing to `C301`) but is actually 6,
> which is where `C301` would be after the join {{`T2 join T3`}}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)