Cheng Su created SPARK-32649:
--------------------------------

             Summary: Optimize BHJ/SHJ inner and semi join with empty hashed 
relation
                 Key: SPARK-32649
                 URL: https://issues.apache.org/jira/browse/SPARK-32649
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 3.1.0
            Reporter: Cheng Su


With `EmptyHashedRelation` introduced in 
[https://github.com/apache/spark/pull/29389], it inspired me that there's a 
minor optimization we can apply to broadcast hash join and shuffled hash join 
if build side hashed relation is empty.

If build side hashed relation is empty (i.e. build side is empty)

1.inner join: we don't need to execute stream side at all, just return an empty 
iterator - 
[https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashJoin.scala#L152]

2.semi join: we don't need to execute stream side at all, just return an empty 
iterator - 
[https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashJoin.scala#L227]
 .

This is not common that build side is empty, but in case it is, we can leverage 
it to not execute stream side at all for better query CPU/IO performance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to