[ 
https://issues.apache.org/jira/browse/SPARK-32573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leanken.Lin updated SPARK-32573:
--------------------------------
    Description: 
In SPARK-32290, we introduced several new types of HashedRelation
 * EmptyHashedRelation
 * EmptyHashedRelationWithAllNullKeys

They were all limited to used only in NAAJ scenario. These new HashedRelation 
could be applied to other scenario for performance improvements.
 * EmptyHashedRelation could also be used in Normal AntiJoin for fast stop
 * While AQE is on and buildSide is EmptyHashedRelationWithAllNullKeys, can 
convert NAAJ to a Empty LocalRelation to skip meaningless data iteration since 
in Single-Key NAAJ, if null key exists in BuildSide, will drop all records in 
streamedSide.

This Patch including two changes.
 * using EmptyHashedRelation to do fast stop for common anti join as well
 * In AQE, eliminate BroadcastHashJoin(NAAJ) if buildSide is a 
EmptyHashedRelationWithAllNullKeys

  was:
In [SPARK-32290|https://issues.apache.org/jira/browse/SPARK-32290], we 
introduced several new types of HashedRelation
 * EmptyHashedRelation
 * EmptyHashedRelationWithAllNullKeys

They were all limited to used only in NAAJ scenario. But as for a improvement, 
EmptyHashedRelation could also be used in Normal AntiJoin for fast stop, and as 
for in AQE, we can even eliminate anti join when we knew that buildSide is 
empty.

 

This Patch including two changes.

In Non-AQE, using EmptyHashedRelation to do fast stop for common anti join as 
well

In AQE, eliminate anti join if buildSide is a EmptyHashedRelation of 
ShuffleWriteRecord is 0

 

        Summary: Anti Join Improvement with EmptyHashedRelation and 
EmptyHashedRelationWithAllNullKeys  (was: Eliminate Anti Join when BuildSide is 
Empty)

> Anti Join Improvement with EmptyHashedRelation and 
> EmptyHashedRelationWithAllNullKeys
> -------------------------------------------------------------------------------------
>
>                 Key: SPARK-32573
>                 URL: https://issues.apache.org/jira/browse/SPARK-32573
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 3.0.0
>            Reporter: Leanken.Lin
>            Priority: Minor
>
> In SPARK-32290, we introduced several new types of HashedRelation
>  * EmptyHashedRelation
>  * EmptyHashedRelationWithAllNullKeys
> They were all limited to used only in NAAJ scenario. These new HashedRelation 
> could be applied to other scenario for performance improvements.
>  * EmptyHashedRelation could also be used in Normal AntiJoin for fast stop
>  * While AQE is on and buildSide is EmptyHashedRelationWithAllNullKeys, can 
> convert NAAJ to a Empty LocalRelation to skip meaningless data iteration 
> since in Single-Key NAAJ, if null key exists in BuildSide, will drop all 
> records in streamedSide.
> This Patch including two changes.
>  * using EmptyHashedRelation to do fast stop for common anti join as well
>  * In AQE, eliminate BroadcastHashJoin(NAAJ) if buildSide is a 
> EmptyHashedRelationWithAllNullKeys



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to