[jira] [Commented] (CALCITE-5927) LoptOptimizeJoinRule has wrong condition when finding out if Self-Join keys are unique

qiang.wang (Jira) Thu, 17 Aug 2023 06:09:05 -0700


    [ 
https://issues.apache.org/jira/browse/CALCITE-5927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17755546#comment-17755546
 ]


qiang.wang commented on CALCITE-5927:
-------------------------------------

[~jhyde] It's hard to provide a query involving EMP and DEPT and some derived 
columns where this rule makes the wrong decision. But i can provide a test case 
involving not derived column which contains self join but can't be found out by 
LoptOptimizeJoinRule due to this bug. 
{code:java}
@Test public void testReorderForSelfJoin() throws Exception {
  final String sql = "select * \n" +
      "from \"depts\" as d0\n" +
      "join \"emps\" as d1\n" +
      "on d0.\"deptno\" = d1.\"deptno\"\n" +
      "join \"depts\" as d2\n" +
      "on d0.\"deptno\" = d2.\"deptno\"\n" +
      "join \"emps\" as d3\n" +
      "on d2.\"deptno\" = d3.\"deptno\"";

  SchemaPlus rootSchema = Frameworks.createRootSchema(true);
  // we add customized table into root schema instead of using HrSchema,
  // because we need statistics
  rootSchema.add("depts", new AbstractTable(){
    @Override public RelDataType getRowType(RelDataTypeFactory typeFactory) {
      return typeFactory.builder()
          .add("deptno", SqlTypeName.INTEGER)
          .add("added", SqlTypeName.INTEGER)
          .add("name", SqlTypeName.VARCHAR)
          .build();
    }
    @Override public Statistic getStatistic() {
      return Statistics.of(245D,
          ImmutableList.of(ImmutableBitSet.of(0)));
    }
  });
  rootSchema.add("emps", new AbstractTable(){
    @Override public RelDataType getRowType(RelDataTypeFactory typeFactory) {
      return typeFactory.builder()
          .add("empid", SqlTypeName.INTEGER)
          .add("deptno", SqlTypeName.INTEGER)
          .add("name", SqlTypeName.VARCHAR)
          .build();
    }
    @Override public Statistic getStatistic() {
      return Statistics.of(240D,
          ImmutableList.of(ImmutableBitSet.of(0)));
    }
  });

  final FrameworkConfig config = Frameworks.newConfigBuilder()
      .parserConfig(SqlParser.Config.DEFAULT.withCaseSensitive(false))
      .defaultSchema(rootSchema)
      .ruleSets()
      .build();

  final Planner planner = Frameworks.getPlanner(config);
  SqlNode sqlNode = planner.parse(sql);
  sqlNode = planner.validate(sqlNode);
  RelNode relNode = planner.rel(sqlNode).rel;

  final HepProgram program =
      HepProgram.builder()
          .addMatchOrder(HepMatchOrder.BOTTOM_UP)
          .addGroupBegin()
          .addRuleInstance(CoreRules.PROJECT_REMOVE)
          .addRuleInstance(CoreRules.JOIN_PROJECT_BOTH_TRANSPOSE)
          .addRuleInstance(CoreRules.PROJECT_MERGE)
          .addGroupEnd()
          .addRuleInstance(CoreRules.JOIN_TO_MULTI_JOIN)
          .addRuleInstance(CoreRules.MULTI_JOIN_OPTIMIZE)
          .build();

  final HepPlanner hepPlanner = new HepPlanner(program);
  hepPlanner.setRoot(relNode);
  RelNode bestNode = hepPlanner.findBestExp();
  final String expected = ""+
      "LogicalJoin(condition=[=($3, $0)], joinType=[inner])\n"
      + "        LogicalTableScan(table=[[depts]])\n"
      + "        LogicalTableScan(table=[[depts]])";
  assertThat(toString(bestNode), containsString(expected));
} {code}
Every one can run this test in org.apache.calcite.tools.PlannerTest. In this 
case, the join factor _d0_ and _d2_ should be left node and right node of one 
Join node after rule, because they are self join which can be removed further. 
But now it can't.

> LoptOptimizeJoinRule has wrong condition when finding out if Self-Join keys 
> are unique
> --------------------------------------------------------------------------------------
>
>                 Key: CALCITE-5927
>                 URL: https://issues.apache.org/jira/browse/CALCITE-5927
>             Project: Calcite
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 1.33.0, 1.34.0, 1.35.0
>            Reporter: qiang.wang
>            Assignee: qiang.wang
>            Priority: Minor
>
> There's wrong condition in function _areSelfJoinKeysUnique._ 
>  
> {code:java}
> // Make sure each key on the left maps to the same simple column as the
> // corresponding key on the right
> for (IntPair pair : joinInfo.pairs()) {
>   final RelColumnOrigin leftOrigin =
>       mq.getColumnOrigin(leftRel, pair.source);
>   if (leftOrigin == null || !leftOrigin.isDerived()) {
>     return false;
>   }
>   final RelColumnOrigin rightOrigin =
>       mq.getColumnOrigin(rightRel, pair.target);
>   if (rightOrigin == null || !rightOrigin.isDerived()) {
>     return false;
>   }
>   if (leftOrigin.getOriginColumnOrdinal()
>       != rightOrigin.getOriginColumnOrdinal()) {
>     return false;
>   }
> } {code}
> The wrong conditions are '{_}if (leftOrigin == null || 
> !leftOrigin.isDerived()){_}' and '{_}if (rightOrigin == null || 
> !rightOrigin.isDerived()){_}'.  
> This function wants to find out if the self-join keys are unique. so for each 
> self-join key, find
> _leftOrigin_ and _rightOrigin_ first, then will return false if any of them 
> is null. But why it returns false when any of them is not _Derived?_  I think 
> exactly the opposite is right.
>  I think this is a bug comes from CALCITE-4251
> Before that PR, this function will return false only when _leftOrigin_ or 
> _rightOrigin_ is null, and the function _RelMetadataQuery#getColumnOrigin_ 
> will return null if column is derived, so the logic is : 'this function will 
> return null when  _leftOrigin_ or _rightOrigin_ is null or is derived', but 
> now is : 'this function will return null when  _leftOrigin_ or _rightOrigin_ 
> is null or is not derived'.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (CALCITE-5927) LoptOptimizeJoinRule has wrong condition when finding out if Self-Join keys are unique

Reply via email to