[ https://issues.apache.org/jira/browse/SPARK-32573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Leanken.Lin updated SPARK-32573: -------------------------------- Description: In SPARK-32290, we introduced several new types of HashedRelation * EmptyHashedRelation * EmptyHashedRelationWithAllNullKeys They were all limited to used only in NAAJ scenario. These new HashedRelation could be applied to other scenario for performance improvements. * EmptyHashedRelation could also be used in Normal AntiJoin for fast stop * While AQE is on and buildSide is EmptyHashedRelationWithAllNullKeys, can convert NAAJ to a Empty LocalRelation to skip meaningless data iteration since in Single-Key NAAJ, if null key exists in BuildSide, will drop all records in streamedSide. This Patch including two changes. * using EmptyHashedRelation to do fast stop for common anti join as well * In AQE, eliminate BroadcastHashJoin(NAAJ) if buildSide is a EmptyHashedRelationWithAllNullKeys was: In [SPARK-32290|https://issues.apache.org/jira/browse/SPARK-32290], we introduced several new types of HashedRelation * EmptyHashedRelation * EmptyHashedRelationWithAllNullKeys They were all limited to used only in NAAJ scenario. But as for a improvement, EmptyHashedRelation could also be used in Normal AntiJoin for fast stop, and as for in AQE, we can even eliminate anti join when we knew that buildSide is empty. This Patch including two changes. In Non-AQE, using EmptyHashedRelation to do fast stop for common anti join as well In AQE, eliminate anti join if buildSide is a EmptyHashedRelation of ShuffleWriteRecord is 0 Summary: Anti Join Improvement with EmptyHashedRelation and EmptyHashedRelationWithAllNullKeys (was: Eliminate Anti Join when BuildSide is Empty) > Anti Join Improvement with EmptyHashedRelation and > EmptyHashedRelationWithAllNullKeys > ------------------------------------------------------------------------------------- > > Key: SPARK-32573 > URL: https://issues.apache.org/jira/browse/SPARK-32573 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 3.0.0 > Reporter: Leanken.Lin > Priority: Minor > > In SPARK-32290, we introduced several new types of HashedRelation > * EmptyHashedRelation > * EmptyHashedRelationWithAllNullKeys > They were all limited to used only in NAAJ scenario. These new HashedRelation > could be applied to other scenario for performance improvements. > * EmptyHashedRelation could also be used in Normal AntiJoin for fast stop > * While AQE is on and buildSide is EmptyHashedRelationWithAllNullKeys, can > convert NAAJ to a Empty LocalRelation to skip meaningless data iteration > since in Single-Key NAAJ, if null key exists in BuildSide, will drop all > records in streamedSide. > This Patch including two changes. > * using EmptyHashedRelation to do fast stop for common anti join as well > * In AQE, eliminate BroadcastHashJoin(NAAJ) if buildSide is a > EmptyHashedRelationWithAllNullKeys -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org