[
https://issues.apache.org/jira/browse/CALCITE-6927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17939776#comment-17939776
]
Mihai Budiu commented on CALCITE-6927:
--------------------------------------
IS NOT DISTINCT FROM works great with hash-joins. In fact, it works better than
EQUALS.
> Add rule for join condition remove IS NOT DISTINCT FROM
> -------------------------------------------------------
>
> Key: CALCITE-6927
> URL: https://issues.apache.org/jira/browse/CALCITE-6927
> Project: Calcite
> Issue Type: Improvement
> Reporter: Zhen Chen
> Assignee: Zhen Chen
> Priority: Major
> Labels: pull-request-available
>
> By referring to the conversion method of spark, IS NOT DISTINCT FROM can be
> converted to `(coalesce(x, '') = coalesce(y, '') ) and (isnull( x ) = isnull(
> y ))` so that the join with IS NOT DISTINCT FROM condition can be used
> HashJoin instead of NestedLoopJoin when converting the logical plan to the
> physical plan.
> The sql is as follows:
> {code:java}
> explain
> select t1.age from user_profiles as t1
> join user_profiles t2
> on t1.user_id <=> t2.user_id; {code}
> The spark plan is as follows:
> {code:java}
> AdaptiveSparkPlan isFinalPlan=false
> +- Project [age#6]
> +- BroadcastHashJoin [coalesce(user_id#5, ), isnull(user_id#5)],
> [coalesce(user_id#29, ), isnull(user_id#29)], Inner, BuildRight, false
> :- FileScan orc default.user_profiles[user_id#5,age#6] Batched: true,
> Bucketed: false (disabled by query planner), DataFilters: [], Format: ORC,
> Location: InMemoryFileIndex(1 paths)[file:..., PartitionFilters: [],
> PushedFilters: [], ReadSchema: struct<user_id:string,age:int>
> +- BroadcastExchange HashedRelationBroadcastMode(List(coalesce(input[0,
> string, true], ), isnull(input[0, string, true])),false), [plan_id=72]
> +- FileScan orc default.user_profiles[user_id#29] Batched: true,
> Bucketed: false (disabled by query planner), DataFilters: [], Format: ORC,
> Location: InMemoryFileIndex(1 paths)[file:..., PartitionFilters: [],
> PushedFilters: [], ReadSchema: struct<user_id:string>{code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)