[
https://issues.apache.org/jira/browse/SPARK-30751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Herman van Hövell resolved SPARK-30751.
---------------------------------------
Fix Version/s: 3.0.0
Resolution: Fixed
> Combine the skewed readers into one in AQE skew join optimizations
> ------------------------------------------------------------------
>
> Key: SPARK-30751
> URL: https://issues.apache.org/jira/browse/SPARK-30751
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 3.0.0
> Reporter: Wei Xue
> Assignee: Wenchen Fan
> Priority: Major
> Fix For: 3.0.0
>
>
> Assume we have N partitions based on the original join keys, and for a
> specific partition id {{Pi}} (i = 1 to N), we slice the left partition into
> {{Li}} sub-partitions (L = 1 if no skew; L > 1 if skewed), the right
> partition into {{Mi}} sub-partitions (M = 1 if no skew; M > 1 if skewed).
> With the current approach, we’ll end up with a sum of {{Li}} * {{Mi}} (i = 1
> to N where Li > 1 or Mi > 1) plus one (for the rest of the partitions without
> skew) joins. *This can be a serious performance concern as the size of the
> query plan now depends on the number and size of skewed partitions.*
> Now instead of generating so many joins we can create a “repeated” reader for
> either side of the join so that:
> # for the left side, with each partition id Pi and any given slice {{Sj}} in
> {{Pi}} (j = 1 to Li), it generates {{Mi}} repeated partitions with respective
> join keys as {{PiSjT1}}, {{PiSjT2}}, …, {{PiSjTm}}
> # for the right side, with each partition id Pi and any given slice {{Tk}}
> in {{Pi}} (k = 1 to Mi), it generates {{Li}} repeated partitions with
> respective join keys as {{PiS1Tk}}, {{PiS2Tk}}, …, {{PiSlTk}}
> That way, we can have one SMJ for all the partitions and only one type of
> special reader.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]