[
https://issues.apache.org/jira/browse/CALCITE-3846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ruben Q L resolved CALCITE-3846.
--------------------------------
Resolution: Fixed
> EnumerableMergeJoin: wrong comparison of composite key with null values
> -----------------------------------------------------------------------
>
> Key: CALCITE-3846
> URL: https://issues.apache.org/jira/browse/CALCITE-3846
> Project: Calcite
> Issue Type: Bug
> Components: core
> Affects Versions: 1.22.0
> Reporter: Ruben Q L
> Assignee: Ruben Q L
> Priority: Major
> Labels: pull-request-available
> Fix For: 1.23.0
>
> Time Spent: 2h
> Remaining Estimate: 0h
>
> The problem can be reproduced with the following test in
> {{EnumerablesJoinTest.java}}:
> {code}
> @Test public void testMergeJoinWithCompositeKeyAndNull() {
> tester(false, new JdbcTest.HrSchema())
> .query("?")
> .withHook(Hook.PLANNER, (Consumer<RelOptPlanner>) planner -> {
> planner.addRule(EnumerableRules.ENUMERABLE_MERGE_JOIN_RULE);
> planner.removeRule(EnumerableRules.ENUMERABLE_JOIN_RULE);
> })
> .withRel(builder -> builder
> .scan("s", "emps")
> .sort(builder.field("deptno"), builder.field("commission"))
> .scan("s", "emps")
> .sort(builder.field("deptno"), builder.field("commission"))
> .join(JoinRelType.INNER,
> builder.and(
> builder.equals(
> builder.field(2, 0, "deptno"),
> builder.field(2, 1, "deptno")),
> builder.equals(
> builder.field(2, 0, "commission"),
> builder.field(2, 1, "commission"))))
> .project(
> builder.field("empid"))
> .build())
> .explainHookMatches("" // It is important that we have MergeJoin in
> the plan
> + "EnumerableCalc(expr#0..4=[{inputs}], empid=[$t0])\n"
> + " EnumerableMergeJoin(condition=[AND(=($1, $3), =($2, $4))],
> joinType=[inner])\n"
> + " EnumerableSort(sort0=[$1], sort1=[$2], dir0=[ASC],
> dir1=[ASC])\n"
> + " EnumerableCalc(expr#0..4=[{inputs}],
> proj#0..1=[{exprs}], commission=[$t4])\n"
> + " EnumerableTableScan(table=[[s, emps]])\n"
> + " EnumerableSort(sort0=[$0], sort1=[$1], dir0=[ASC],
> dir1=[ASC])\n"
> + " EnumerableCalc(expr#0..4=[{inputs}], deptno=[$t1],
> commission=[$t4])\n"
> + " EnumerableTableScan(table=[[s, emps]])\n")
> .returnsUnordered("empid=100\nempid=110\nempid=150\nempid=200");
> }
> {code}
> The test fails with the following exception:
> {code}
> java.lang.IllegalStateException: mergeJoin assumes inputs sorted in ascending
> order, however [10, 1000] is greater than [10, null]
> {code}
> The problem is that {{EnumerableMergeJoin}} implementation (i.e.
> {{EnumerableDefaults#mergeJoin}}) expects its inputs to be sorted in
> ascending order, nulls last [1] (see {{EnumerableMergeJoinRule}} [2]). In
> case of a composite key, {{EnumerableMergeJoin}} will represent keys as
> {{JavaRowFormat.LIST}} [3], which is a comparable list, whose comparison is
> implemented via {{FlatLists.ComparableListImpl#compare}}. This method will
> compare both lists, item by item, but it will consider that a null item is
> less-than a non-null item [4]. This is a de-facto nulls-first collation,
> which contradicts the pre-requisite of the mergeJoin algorithm.
> [1]
> https://github.com/apache/calcite/blob/200a136dd3b2ca04a1d1283eb5a03a04388b1947/linq4j/src/main/java/org/apache/calcite/linq4j/EnumerableDefaults.java#L1966
> [2]
> https://github.com/apache/calcite/blob/200a136dd3b2ca04a1d1283eb5a03a04388b1947/core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableMergeJoinRule.java#L72
> [3]
> https://github.com/apache/calcite/blob/200a136dd3b2ca04a1d1283eb5a03a04388b1947/core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableMergeJoin.java#L154
> [4]
> https://github.com/apache/calcite/blob/200a136dd3b2ca04a1d1283eb5a03a04388b1947/core/src/main/java/org/apache/calcite/runtime/FlatLists.java#L1350
--
This message was sent by Atlassian Jira
(v8.3.4#803005)