[ https://issues.apache.org/jira/browse/SPARK-32399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wensheng Wang updated SPARK-32399: ---------------------------------- Attachment: Screen Shot 2021-03-09 at 3.06.30 PM.png > Support full outer join in shuffled hash join > --------------------------------------------- > > Key: SPARK-32399 > URL: https://issues.apache.org/jira/browse/SPARK-32399 > Project: Spark > Issue Type: Sub-task > Components: SQL > Affects Versions: 3.1.0 > Reporter: Cheng Su > Assignee: Cheng Su > Priority: Minor > Fix For: 3.1.0 > > Attachments: Screen Shot 2020-10-14 at 11.08.37 PM.png, Screen Shot > 2020-10-14 at 12.30.07 PM.png, Screen Shot 2021-03-09 at 3.06.30 PM.png > > > Currently for SQL full outer join, spark always does a sort merge join no > matter of how large the join children size are. Inspired by recent discussion > in [https://github.com/apache/spark/pull/29130#discussion_r456502678] and > [https://github.com/apache/spark/pull/29181], I think we can support full > outer join in shuffled hash join in a way that - when looking up stream side > keys from build side {{HashedRelation}}. Mark this info inside build side > {{HashedRelation}}, and after reading all rows from stream side, output all > non-matching rows from build side based on modified {{HashedRelation}}. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org