[ 
https://issues.apache.org/jira/browse/CALCITE-6927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mihai Budiu resolved CALCITE-6927.
----------------------------------
    Fix Version/s: 1.40.0
       Resolution: Fixed

Fixed in 
https://github.com/apache/calcite/commit/7cea035f0533e5d420b9cec7008f26d835cb6b84
Thank you for your contribution [~jensen]

> Add rule for join condition remove IS NOT DISTINCT FROM
> -------------------------------------------------------
>
>                 Key: CALCITE-6927
>                 URL: https://issues.apache.org/jira/browse/CALCITE-6927
>             Project: Calcite
>          Issue Type: Improvement
>            Reporter: Zhen Chen
>            Assignee: Zhen Chen
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.40.0
>
>
> By referring to the conversion method of spark, IS NOT DISTINCT FROM can be 
> converted to `(coalesce(x, '') = coalesce(y, '') ) and (isnull( x ) = isnull( 
> y ))` so that the join with IS NOT DISTINCT FROM condition can be used 
> HashJoin instead of NestedLoopJoin when converting the logical plan to the 
> physical plan.  
> The sql is as follows:
> {code:java}
> explain 
> select t1.age from user_profiles as t1 
> join user_profiles t2 
> on t1.user_id <=> t2.user_id;  {code}
> The spark plan is as follows:
> {code:java}
> AdaptiveSparkPlan isFinalPlan=false
> +- Project [age#6]
>    +- BroadcastHashJoin [coalesce(user_id#5, ), isnull(user_id#5)], 
> [coalesce(user_id#29, ), isnull(user_id#29)], Inner, BuildRight, false
>       :- FileScan orc default.user_profiles[user_id#5,age#6] Batched: true, 
> Bucketed: false (disabled by query planner), DataFilters: [], Format: ORC, 
> Location: InMemoryFileIndex(1 paths)[file:..., PartitionFilters: [], 
> PushedFilters: [], ReadSchema: struct<user_id:string,age:int>
>       +- BroadcastExchange HashedRelationBroadcastMode(List(coalesce(input[0, 
> string, true], ), isnull(input[0, string, true])),false), [plan_id=72]
>          +- FileScan orc default.user_profiles[user_id#29] Batched: true, 
> Bucketed: false (disabled by query planner), DataFilters: [], Format: ORC, 
> Location: InMemoryFileIndex(1 paths)[file:..., PartitionFilters: [], 
> PushedFilters: [], ReadSchema: struct<user_id:string>{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to