Chao Sun created SPARK-41471: -------------------------------- Summary: SPJ: reduce Spark shuffle when only one side of a join is KeyGroupedPartitioning Key: SPARK-41471 URL: https://issues.apache.org/jira/browse/SPARK-41471 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.3.1 Reporter: Chao Sun
When only one side of a SPJ (Storage-Partitioned Join) is {{{}KeyGroupedPartitioning{}}}, Spark currently needs to shuffle both sides using {{{}HashPartitioning{}}}. However, we may just need to shuffle the other side according to the partition transforms defined in {{{}KeyGroupedPartitioning{}}}. This is especially useful when the other side is relatively small. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org