Cheng Su created SPARK-37341: -------------------------------- Summary: Avoid unnecessary buffer and copy in full outer sort merge join Key: SPARK-37341 URL: https://issues.apache.org/jira/browse/SPARK-37341 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.3.0 Reporter: Cheng Su
FULL OUTER sort merge join (non-code-gen path) copies join keys and buffers input rows, even when rows from both sides do have matched keys ([https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala#L1637-L1641] ). This is unnecessary, as we can just output the row with smaller join keys, and only buffer when both sides have matched keys. This would save us from unnecessary copy and buffer, when both join sides have a lot of rows not matched with each other. -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org