[ 
https://issues.apache.org/jira/browse/CALCITE-5952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leonid Chistov updated CALCITE-5952:
------------------------------------
    Description: 
The following test will fail if added to RelOptRulesTest.java
{code:java}
@Test void testCanNotPushSemiJoinToRightJoinBranch() {
  final Function<RelBuilder, RelNode> relFn = b -> b
      .scan("EMP")
      .scan("DEPT")
      .join(JoinRelType.LEFT,
          b.equals(
              b.field(2, 0, "DEPTNO"),
              b.field(2, 1, "DEPTNO"))
      )
      .scan("BONUS")
      // semi join only relates to RHS fields of left join
      .semiJoin(b.equals(
          b.field(2, 0, "DNAME"),
          b.field(2, 1, "JOB")))
      .build();
  relFn(relFn).withRule(CoreRules.SEMI_JOIN_JOIN_TRANSPOSE).checkUnchanged();
} {code}
Produced plan will look like:
{code:java}
LogicalJoin(condition=[=($7, $8)], joinType=[left])
  LogicalTableScan(table=[[scott, EMP]])
  LogicalJoin(condition=[=($1, $4)], joinType=[semi])
    LogicalTableScan(table=[[scott, DEPT]])
    LogicalTableScan(table=[[scott, BONUS]]) {code}
Which is different from the original plan:
{code:java}
LogicalJoin(condition=[=($9, $12)], joinType=[semi])
  LogicalJoin(condition=[=($7, $8)], joinType=[left])
    LogicalTableScan(table=[[scott, EMP]])
    LogicalTableScan(table=[[scott, DEPT]])
  LogicalTableScan(table=[[scott, BONUS]]) {code}
This is not correct - in general case it is not correct to push semi-join to 
right side of left-join.

The reason is the following:

Consider rows from *EMP* that have no matching rows in {*}DEPT{*}. These rows 
will have *nulls* for *DEPT* columns in the result of left-join and they will 
be rejected by the top semi-join.

But if we push semi-join to RHS of left-join, we are going to see rows from 
*EMP* with *nulls* on the *DEPT* side in the final result.

  was:
The following test will fail if added to RelOptRulesTest.java
{code:java}
@Test void testCanNotPushSemiJoinToRightJoinBranch() {
  final Function<RelBuilder, RelNode> relFn = b -> b
      .scan("EMP")
      .scan("DEPT")
      .join(JoinRelType.LEFT,
          b.equals(
              b.field(2, 0, "DEPTNO"),
              b.field(2, 1, "DEPTNO"))
      )
      .scan("BONUS")
      .semiJoin(b.equals(
          b.field(2, 0, "DNAME"),
          b.field(2, 1, "JOB")))
      .build();
  relFn(relFn).withRule(CoreRules.SEMI_JOIN_JOIN_TRANSPOSE).checkUnchanged();
} {code}
Produced plan will look like:
{code:java}
LogicalJoin(condition=[=($7, $8)], joinType=[left])
  LogicalTableScan(table=[[scott, EMP]])
  LogicalJoin(condition=[=($1, $4)], joinType=[semi])
    LogicalTableScan(table=[[scott, DEPT]])
    LogicalTableScan(table=[[scott, BONUS]]) {code}
Which is different from the original plan:
{code:java}
LogicalJoin(condition=[=($9, $12)], joinType=[semi])
  LogicalJoin(condition=[=($7, $8)], joinType=[left])
    LogicalTableScan(table=[[scott, EMP]])
    LogicalTableScan(table=[[scott, DEPT]])
  LogicalTableScan(table=[[scott, BONUS]]) {code}
This is not correct - in general case it is not correct to push semi-join to 
right side of left-join.

The reason is the following:

Consider rows from *EMP* that have no matching rows in {*}DEPT{*}. These rows 
will have *nulls* for *DEPT* columns in the result of left-join and they will 
be rejected by the top semi-join.

But if we push semi-join to RHS of left-join, we are going to see rows from 
*EMP* with *nulls* on the *DEPT* side in the final result.


> Semi-Join incorrectly reordered with Left-Join by SemiJoinJoinTransposeRule
> ---------------------------------------------------------------------------
>
>                 Key: CALCITE-5952
>                 URL: https://issues.apache.org/jira/browse/CALCITE-5952
>             Project: Calcite
>          Issue Type: Bug
>    Affects Versions: 1.35.0
>            Reporter: Leonid Chistov
>            Assignee: Leonid Chistov
>            Priority: Major
>
> The following test will fail if added to RelOptRulesTest.java
> {code:java}
> @Test void testCanNotPushSemiJoinToRightJoinBranch() {
>   final Function<RelBuilder, RelNode> relFn = b -> b
>       .scan("EMP")
>       .scan("DEPT")
>       .join(JoinRelType.LEFT,
>           b.equals(
>               b.field(2, 0, "DEPTNO"),
>               b.field(2, 1, "DEPTNO"))
>       )
>       .scan("BONUS")
>       // semi join only relates to RHS fields of left join
>       .semiJoin(b.equals(
>           b.field(2, 0, "DNAME"),
>           b.field(2, 1, "JOB")))
>       .build();
>   relFn(relFn).withRule(CoreRules.SEMI_JOIN_JOIN_TRANSPOSE).checkUnchanged();
> } {code}
> Produced plan will look like:
> {code:java}
> LogicalJoin(condition=[=($7, $8)], joinType=[left])
>   LogicalTableScan(table=[[scott, EMP]])
>   LogicalJoin(condition=[=($1, $4)], joinType=[semi])
>     LogicalTableScan(table=[[scott, DEPT]])
>     LogicalTableScan(table=[[scott, BONUS]]) {code}
> Which is different from the original plan:
> {code:java}
> LogicalJoin(condition=[=($9, $12)], joinType=[semi])
>   LogicalJoin(condition=[=($7, $8)], joinType=[left])
>     LogicalTableScan(table=[[scott, EMP]])
>     LogicalTableScan(table=[[scott, DEPT]])
>   LogicalTableScan(table=[[scott, BONUS]]) {code}
> This is not correct - in general case it is not correct to push semi-join to 
> right side of left-join.
> The reason is the following:
> Consider rows from *EMP* that have no matching rows in {*}DEPT{*}. These rows 
> will have *nulls* for *DEPT* columns in the result of left-join and they will 
> be rejected by the top semi-join.
> But if we push semi-join to RHS of left-join, we are going to see rows from 
> *EMP* with *nulls* on the *DEPT* side in the final result.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to