[
https://issues.apache.org/jira/browse/CALCITE-6904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17937596#comment-17937596
]
Zhen Chen commented on CALCITE-6904:
------------------------------------
There are some issues involved here:
# Using IS_NOT_DISTINCT_FROM as HashJoin condition, it can not get right
result.
# IS_NOT_DISTINCT_FROM is directly used as the condition of NestedLoopJoin, we
will also meet an error. Because we have not implemented
IsNotDistinctFromImplementor. Here I commented out the generation of HashJoin
to force it to use NestedLoopJoin. Sorry, I didn't find an example that can
directly generate a NestedLoopJoin case.
Plan is:
EnumerableProject(commission=[$0])
EnumerableNestedLoopJoin(condition=[IS NOT DISTINCT FROM($0, $1)],
joinType=[inner])
EnumerableProject(commission=[$4])
EnumerableTableScan(table=[[hr, emps]])
EnumerableProject(commission=[$4])
EnumerableTableScan(table=[[hr, emps]])
Error while executing SQL "select "t1"."commission" from "hr"."emps" as "t1"
join
"hr"."emps" as "t2"
on "t1"."commission" is not distinct from "t2"."commission"": Unable to
implement EnumerableCalc(expr#0..1=[\{inputs}], commission=[$t0]): rowcount =
2500.0, cumulative cost = \{27900.0 rows, 8902.0 cpu, 0.0 io}, id = 142
EnumerableNestedLoopJoin(condition=[IS NOT DISTINCT FROM($0, $1)],
joinType=[inner]): rowcount = 2500.0, cumulative cost = \{25400.0 rows, 1402.0
cpu, 0.0 io}, id = 138
EnumerableCalc(expr#0..4=[\{inputs}], commission=[$t4]): rowcount = 100.0,
cumulative cost = \{200.0 rows, 701.0 cpu, 0.0 io}, id = 144
EnumerableTableScan(table=[[hr, emps]]): rowcount = 100.0, cumulative
cost = \{100.0 rows, 101.0 cpu, 0.0 io}, id = 119
EnumerableCalc(expr#0..4=[\{inputs}], commission=[$t4]): rowcount = 100.0,
cumulative cost = \{200.0 rows, 701.0 cpu, 0.0 io}, id = 146
EnumerableTableScan(table=[[hr, emps]]): rowcount = 100.0, cumulative
cost = \{100.0 rows, 101.0 cpu, 0.0 io}, id = 119
# {{t1.key = t2.key OR (t1.key IS NULL AND t2.key IS NULL)}}will be folded
into {{t1.key IS NOT DISTINCT FROM t2.key}} (using
collapseExpandedIsNotDistinctFromExpr). This makes it impossible to execute the
original OR expression.
# Currently, calcite recommends using IS_NOT_DISTINCT_FROM, which will be
treated as an EQUAL condition, but the difference from the EQUAL condition is
the handling of NULL. When encountering a join condition of
IS_NOT_DISTINCT_FROM, HashJoin will be used, but the null information saved by
{{filterNulls}} will be omitted during the conversion process.
Some suggestions:
# Disable IS_NOT_DISTINCT_FROM fold. These can use OR expression to execute
correctly.
# Implemented IsNotDistinctFromImplementor and IS_NOT_DISTINCT_FROM is not
considered as an EQUAL condition, let IS_NOT_DISTINCT_FROM execute correctly.
# Add NULL_AWARE parameter to make HashJoin aware of null handling behavior.
The reason why this problem is found is that there is no case in jdbctest about
using IS_NOT_DISTINCT_FROM as a join condition.I have also read some related
PRs, but I may still not understand the full picture of calcite execution, and
I may need some advice from you.
> IS_NOT_DISTINCT_FROM is converted error in EnumerableJoinRule
> -------------------------------------------------------------
>
> Key: CALCITE-6904
> URL: https://issues.apache.org/jira/browse/CALCITE-6904
> Project: Calcite
> Issue Type: Bug
> Reporter: Zhen Chen
> Assignee: Zhen Chen
> Priority: Major
> Attachments: image-2025-03-21-13-06-47-354.png
>
>
> The convert() method of EnumerableJoinRule incorrectly converts the condition
> of IS_NOT_DISTINCT_FROM , which is considered an equivalence condition in
> calcite and carries a `filterNulls` to save whether to perform equivalence
> processing on NULL. However, when converting to EnumerableJoin, the
> filterNulls are ignored, resulting in `IS_NOT_DISTINCT_FROM ($0, $1)` being
> converted to `=($0, $1)`.
>
> mysql case:
> {code:java}
> CREATE TABLE emps (
> commission int
> );
> INSERT INTO emps VALUES (1);
> INSERT INTO emps VALUES (2);
> INSERT INTO emps VALUES (3);
> INSERT INTO emps VALUES (NULL);
> SELECT t1.commission FROM emps as t1
> join
> emps as t2
> on t1.commission <=> t2.commission;
> mysql result:
> commission
> 1
> 2
> 3
> NULL {code}
> calcite jdbc case:
> {code:java}
> @Test void testIsNotDistinctFrom1() {
> final String sql = ""
> + "select \"t1\".\"commission\" from \"hr\".\"emps\" as \"t1\"\n"
> + "join\n"
> + "\"hr\".\"emps\" as \"t2\"\n"
> + "on \"t1\".\"commission\" is not distinct from
> \"t2\".\"commission\"";
> CalciteAssert.hr()
> .query(sql)
> .explainContains("")
> .returnsUnordered("commission=500",
> "commission=null");
> } {code}
> debug result:
> !image-2025-03-21-13-06-47-354.png|width=519,height=649!
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)