[
https://issues.apache.org/jira/browse/CALCITE-5927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17755546#comment-17755546
]
qiang.wang commented on CALCITE-5927:
-------------------------------------
[~jhyde] It's hard to provide a query involving EMP and DEPT and some derived
columns where this rule makes the wrong decision. But i can provide a test case
involving not derived column which contains self join but can't be found out by
LoptOptimizeJoinRule due to this bug.
{code:java}
@Test public void testReorderForSelfJoin() throws Exception {
final String sql = "select * \n" +
"from \"depts\" as d0\n" +
"join \"emps\" as d1\n" +
"on d0.\"deptno\" = d1.\"deptno\"\n" +
"join \"depts\" as d2\n" +
"on d0.\"deptno\" = d2.\"deptno\"\n" +
"join \"emps\" as d3\n" +
"on d2.\"deptno\" = d3.\"deptno\"";
SchemaPlus rootSchema = Frameworks.createRootSchema(true);
// we add customized table into root schema instead of using HrSchema,
// because we need statistics
rootSchema.add("depts", new AbstractTable(){
@Override public RelDataType getRowType(RelDataTypeFactory typeFactory) {
return typeFactory.builder()
.add("deptno", SqlTypeName.INTEGER)
.add("added", SqlTypeName.INTEGER)
.add("name", SqlTypeName.VARCHAR)
.build();
}
@Override public Statistic getStatistic() {
return Statistics.of(245D,
ImmutableList.of(ImmutableBitSet.of(0)));
}
});
rootSchema.add("emps", new AbstractTable(){
@Override public RelDataType getRowType(RelDataTypeFactory typeFactory) {
return typeFactory.builder()
.add("empid", SqlTypeName.INTEGER)
.add("deptno", SqlTypeName.INTEGER)
.add("name", SqlTypeName.VARCHAR)
.build();
}
@Override public Statistic getStatistic() {
return Statistics.of(240D,
ImmutableList.of(ImmutableBitSet.of(0)));
}
});
final FrameworkConfig config = Frameworks.newConfigBuilder()
.parserConfig(SqlParser.Config.DEFAULT.withCaseSensitive(false))
.defaultSchema(rootSchema)
.ruleSets()
.build();
final Planner planner = Frameworks.getPlanner(config);
SqlNode sqlNode = planner.parse(sql);
sqlNode = planner.validate(sqlNode);
RelNode relNode = planner.rel(sqlNode).rel;
final HepProgram program =
HepProgram.builder()
.addMatchOrder(HepMatchOrder.BOTTOM_UP)
.addGroupBegin()
.addRuleInstance(CoreRules.PROJECT_REMOVE)
.addRuleInstance(CoreRules.JOIN_PROJECT_BOTH_TRANSPOSE)
.addRuleInstance(CoreRules.PROJECT_MERGE)
.addGroupEnd()
.addRuleInstance(CoreRules.JOIN_TO_MULTI_JOIN)
.addRuleInstance(CoreRules.MULTI_JOIN_OPTIMIZE)
.build();
final HepPlanner hepPlanner = new HepPlanner(program);
hepPlanner.setRoot(relNode);
RelNode bestNode = hepPlanner.findBestExp();
final String expected = ""+
"LogicalJoin(condition=[=($3, $0)], joinType=[inner])\n"
+ " LogicalTableScan(table=[[depts]])\n"
+ " LogicalTableScan(table=[[depts]])";
assertThat(toString(bestNode), containsString(expected));
} {code}
Every one can run this test in org.apache.calcite.tools.PlannerTest. In this
case, the join factor _d0_ and _d2_ should be left node and right node of one
Join node after rule, because they are self join which can be removed further.
But now it can't.
> LoptOptimizeJoinRule has wrong condition when finding out if Self-Join keys
> are unique
> --------------------------------------------------------------------------------------
>
> Key: CALCITE-5927
> URL: https://issues.apache.org/jira/browse/CALCITE-5927
> Project: Calcite
> Issue Type: Bug
> Components: core
> Affects Versions: 1.33.0, 1.34.0, 1.35.0
> Reporter: qiang.wang
> Assignee: qiang.wang
> Priority: Minor
>
> There's wrong condition in function _areSelfJoinKeysUnique._
>
> {code:java}
> // Make sure each key on the left maps to the same simple column as the
> // corresponding key on the right
> for (IntPair pair : joinInfo.pairs()) {
> final RelColumnOrigin leftOrigin =
> mq.getColumnOrigin(leftRel, pair.source);
> if (leftOrigin == null || !leftOrigin.isDerived()) {
> return false;
> }
> final RelColumnOrigin rightOrigin =
> mq.getColumnOrigin(rightRel, pair.target);
> if (rightOrigin == null || !rightOrigin.isDerived()) {
> return false;
> }
> if (leftOrigin.getOriginColumnOrdinal()
> != rightOrigin.getOriginColumnOrdinal()) {
> return false;
> }
> } {code}
> The wrong conditions are '{_}if (leftOrigin == null ||
> !leftOrigin.isDerived()){_}' and '{_}if (rightOrigin == null ||
> !rightOrigin.isDerived()){_}'.
> This function wants to find out if the self-join keys are unique. so for each
> self-join key, find
> _leftOrigin_ and _rightOrigin_ first, then will return false if any of them
> is null. But why it returns false when any of them is not _Derived?_ I think
> exactly the opposite is right.
> I think this is a bug comes from CALCITE-4251
> Before that PR, this function will return false only when _leftOrigin_ or
> _rightOrigin_ is null, and the function _RelMetadataQuery#getColumnOrigin_
> will return null if column is derived, so the logic is : 'this function will
> return null when _leftOrigin_ or _rightOrigin_ is null or is derived', but
> now is : 'this function will return null when _leftOrigin_ or _rightOrigin_
> is null or is not derived'.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)