[
https://issues.apache.org/jira/browse/CALCITE-5952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Leonid Chistov updated CALCITE-5952:
------------------------------------
Description:
The following test will fail if added to RelOptRulesTest.java
{code:java}
@Test void testCanNotPushSemiJoinToRightJoinBranch() {
final Function<RelBuilder, RelNode> relFn = b -> b
.scan("EMP")
.scan("DEPT")
.join(JoinRelType.LEFT,
b.equals(
b.field(2, 0, "DEPTNO"),
b.field(2, 1, "DEPTNO"))
)
.scan("BONUS")
.semiJoin(b.equals(
b.field(2, 0, "DNAME"),
b.field(2, 1, "JOB")))
.build();
relFn(relFn).withRule(CoreRules.SEMI_JOIN_JOIN_TRANSPOSE).checkUnchanged();
} {code}
Produced plan will look like:
{code:java}
LogicalJoin(condition=[=($7, $8)], joinType=[left])
LogicalTableScan(table=[[scott, EMP]])
LogicalJoin(condition=[=($1, $4)], joinType=[semi])
LogicalTableScan(table=[[scott, DEPT]])
LogicalTableScan(table=[[scott, BONUS]]) {code}
Which is different from the original plan:
{code:java}
LogicalJoin(condition=[=($9, $12)], joinType=[semi])
LogicalJoin(condition=[=($7, $8)], joinType=[left])
LogicalTableScan(table=[[scott, EMP]])
LogicalTableScan(table=[[scott, DEPT]])
LogicalTableScan(table=[[scott, BONUS]]) {code}
This is not correct - in general case it is not correct to push semi-join to
right side of left-join.
The reason is the following:
Consider rows from *EMP* that have no matching rows in {*}DEPT{*}. These rows
will have *nulls* for *DEPT* columns in the result of left-join and they will
be rejected by the top semi-join.
But if we push semi-join to RHS of left-join, we are going to see rows from
*EMP* with *nulls* on the *DEPT* side in the final result.
was:
The following test will fail if added to RelOptRulesTest.java
{code:java}
@Test void testCanNotPushSemiJoinToRightJoinBranch() {
final Function<RelBuilder, RelNode> relFn = b -> b
.scan("EMP")
.scan("DEPT")
.join(JoinRelType.LEFT,
b.equals(
b.field(2, 0, "DEPTNO"),
b.field(2, 1, "DEPTNO"))
)
.scan("BONUS")
.semiJoin(b.equals(
b.field(2, 0, "DNAME"),
b.field(2, 1, "JOB")))
.build();
relFn(relFn).withRule(CoreRules.SEMI_JOIN_JOIN_TRANSPOSE).checkUnchanged();
} {code}
Produced plan will look like:
{code:java}
LogicalJoin(condition=[=($7, $8)], joinType=[left])
LogicalTableScan(table=[[scott, EMP]])
LogicalJoin(condition=[=($1, $4)], joinType=[semi])
LogicalTableScan(table=[[scott, DEPT]])
LogicalTableScan(table=[[scott, BONUS]]) {code}
Which is different from the original plan:
{code:java}
LogicalJoin(condition=[=($9, $12)], joinType=[semi])
LogicalJoin(condition=[=($7, $8)], joinType=[left])
LogicalTableScan(table=[[scott, EMP]])
LogicalTableScan(table=[[scott, DEPT]])
LogicalTableScan(table=[[scott, BONUS]]) {code}
This is not correct - in general case it is not correct to push semi-join to
right side of left-join.
The reason is the following:
Consider rows from `EMP` that have no matching rows in DEPT. These rows will
have `nulls` for `DEPT` columns in the result of left-join and they will be
rejected by the top semi-join.
But if we push semi-join to RHS of left-join, we are going to see rows from
`EMP` with `nulls` on the `DEPT` side in the final result.
> Semi-Join incorrectly reordered with Left-Join by SemiJoinJoinTransposeRule
> ---------------------------------------------------------------------------
>
> Key: CALCITE-5952
> URL: https://issues.apache.org/jira/browse/CALCITE-5952
> Project: Calcite
> Issue Type: Bug
> Affects Versions: 1.35.0
> Reporter: Leonid Chistov
> Assignee: Leonid Chistov
> Priority: Major
>
> The following test will fail if added to RelOptRulesTest.java
> {code:java}
> @Test void testCanNotPushSemiJoinToRightJoinBranch() {
> final Function<RelBuilder, RelNode> relFn = b -> b
> .scan("EMP")
> .scan("DEPT")
> .join(JoinRelType.LEFT,
> b.equals(
> b.field(2, 0, "DEPTNO"),
> b.field(2, 1, "DEPTNO"))
> )
> .scan("BONUS")
> .semiJoin(b.equals(
> b.field(2, 0, "DNAME"),
> b.field(2, 1, "JOB")))
> .build();
> relFn(relFn).withRule(CoreRules.SEMI_JOIN_JOIN_TRANSPOSE).checkUnchanged();
> } {code}
> Produced plan will look like:
> {code:java}
> LogicalJoin(condition=[=($7, $8)], joinType=[left])
> LogicalTableScan(table=[[scott, EMP]])
> LogicalJoin(condition=[=($1, $4)], joinType=[semi])
> LogicalTableScan(table=[[scott, DEPT]])
> LogicalTableScan(table=[[scott, BONUS]]) {code}
> Which is different from the original plan:
> {code:java}
> LogicalJoin(condition=[=($9, $12)], joinType=[semi])
> LogicalJoin(condition=[=($7, $8)], joinType=[left])
> LogicalTableScan(table=[[scott, EMP]])
> LogicalTableScan(table=[[scott, DEPT]])
> LogicalTableScan(table=[[scott, BONUS]]) {code}
> This is not correct - in general case it is not correct to push semi-join to
> right side of left-join.
> The reason is the following:
> Consider rows from *EMP* that have no matching rows in {*}DEPT{*}. These rows
> will have *nulls* for *DEPT* columns in the result of left-join and they will
> be rejected by the top semi-join.
> But if we push semi-join to RHS of left-join, we are going to see rows from
> *EMP* with *nulls* on the *DEPT* side in the final result.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)