[ https://issues.apache.org/jira/browse/SPARK-32649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17179397#comment-17179397 ]
Leanken.Lin commented on SPARK-32649: ------------------------------------- Feel free to just send out PR for reviewing,^_^ > Optimize BHJ/SHJ inner and semi join with empty hashed relation > --------------------------------------------------------------- > > Key: SPARK-32649 > URL: https://issues.apache.org/jira/browse/SPARK-32649 > Project: Spark > Issue Type: Sub-task > Components: SQL > Affects Versions: 3.1.0 > Reporter: Cheng Su > Priority: Trivial > > With `EmptyHashedRelation` introduced in > [https://github.com/apache/spark/pull/29389], it inspired me that there's a > minor optimization we can apply to broadcast hash join and shuffled hash join > if build side hashed relation is empty. > If build side hashed relation is empty (i.e. build side is empty) > 1.inner join: we don't need to execute stream side at all, just return an > empty iterator - > [https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashJoin.scala#L152] > 2.semi join: we don't need to execute stream side at all, just return an > empty iterator - > [https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashJoin.scala#L227] > . > This is not common that build side is empty, but in case it is, we can > leverage it to not execute stream side at all for better query CPU/IO > performance. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org