[
https://issues.apache.org/jira/browse/SPARK-35264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Apache Spark reassigned SPARK-35264:
------------------------------------
Assignee: Apache Spark
> Support AQE side broadcastJoin threshold
> ----------------------------------------
>
> Key: SPARK-35264
> URL: https://issues.apache.org/jira/browse/SPARK-35264
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 3.2.0
> Reporter: ulysses you
> Assignee: Apache Spark
> Priority: Major
>
> The main idea here is that make join config isolation between normal planner
> and aqe planner which shared the same code path.
> Actually we don not very trust using the static stat to consider if it can
> build broadcast hash join. In our experience it's very common that Spark
> throw broadcast timeout or driver side OOM exception when execute a bit large
> plan. And due to braodcast join is not reversed which means if we covert join
> to braodcast hash join at first time, we(AQE) can not optimize it again, so
> it should make sense to decide if we can do broadcast at aqe side using
> different sql config.
> In order to achieve this we use a specific join hint in advance during AQE
> framework and then at JoinSelection side it will take and follow the inserted
> hint.
> For now we only support select strategy for equi join, and follow this order
> 1. mark join as broadcast hash join if possible
> 2. mark join as shuffled hash join if possible
> Note that, we don't override join strategy if user specifies a join hint.
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]