[ 
https://issues.apache.org/jira/browse/CALCITE-6927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17939776#comment-17939776
 ] 

Mihai Budiu commented on CALCITE-6927:
--------------------------------------

IS NOT DISTINCT FROM works great with hash-joins. In fact, it works better than 
EQUALS.

> Add rule for join condition remove IS NOT DISTINCT FROM
> -------------------------------------------------------
>
>                 Key: CALCITE-6927
>                 URL: https://issues.apache.org/jira/browse/CALCITE-6927
>             Project: Calcite
>          Issue Type: Improvement
>            Reporter: Zhen Chen
>            Assignee: Zhen Chen
>            Priority: Major
>              Labels: pull-request-available
>
> By referring to the conversion method of spark, IS NOT DISTINCT FROM can be 
> converted to `(coalesce(x, '') = coalesce(y, '') ) and (isnull( x ) = isnull( 
> y ))` so that the join with IS NOT DISTINCT FROM condition can be used 
> HashJoin instead of NestedLoopJoin when converting the logical plan to the 
> physical plan.  
> The sql is as follows:
> {code:java}
> explain 
> select t1.age from user_profiles as t1 
> join user_profiles t2 
> on t1.user_id <=> t2.user_id;  {code}
> The spark plan is as follows:
> {code:java}
> AdaptiveSparkPlan isFinalPlan=false
> +- Project [age#6]
>    +- BroadcastHashJoin [coalesce(user_id#5, ), isnull(user_id#5)], 
> [coalesce(user_id#29, ), isnull(user_id#29)], Inner, BuildRight, false
>       :- FileScan orc default.user_profiles[user_id#5,age#6] Batched: true, 
> Bucketed: false (disabled by query planner), DataFilters: [], Format: ORC, 
> Location: InMemoryFileIndex(1 paths)[file:..., PartitionFilters: [], 
> PushedFilters: [], ReadSchema: struct<user_id:string,age:int>
>       +- BroadcastExchange HashedRelationBroadcastMode(List(coalesce(input[0, 
> string, true], ), isnull(input[0, string, true])),false), [plan_id=72]
>          +- FileScan orc default.user_profiles[user_id#29] Batched: true, 
> Bucketed: false (disabled by query planner), DataFilters: [], Format: ORC, 
> Location: InMemoryFileIndex(1 paths)[file:..., PartitionFilters: [], 
> PushedFilters: [], ReadSchema: struct<user_id:string>{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to