Cheng Su created SPARK-37341:
--------------------------------

             Summary: Avoid unnecessary buffer and copy in full outer sort 
merge join
                 Key: SPARK-37341
                 URL: https://issues.apache.org/jira/browse/SPARK-37341
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 3.3.0
            Reporter: Cheng Su


FULL OUTER sort merge join (non-code-gen path) copies join keys and buffers 
input rows, even when rows from both sides do have matched keys 
([https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala#L1637-L1641]
 ). This is unnecessary, as we can just output the row with smaller join keys, 
and only buffer when both sides have matched keys. This would save us from 
unnecessary copy and buffer, when both join sides have a lot of rows not 
matched with each other.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to