[ 
https://issues.apache.org/jira/browse/SPARK-53298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xuyu updated SPARK-53298:
-------------------------
    Summary: make an isolation to control Shuffle partitionSizeInBytes 
converted from `REBALANCE` hint   (was: make an isolation to control AQE's 
`REBALANCE` hint partitionSizeInBytes)

> make an isolation to control Shuffle partitionSizeInBytes converted from 
> `REBALANCE` hint 
> ------------------------------------------------------------------------------------------
>
>                 Key: SPARK-53298
>                 URL: https://issues.apache.org/jira/browse/SPARK-53298
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 3.5.0, 4.0.0
>            Reporter: xuyu
>            Priority: Major
>              Labels: pull-request-available
>
> The main idea of this issue is that make a config isolation between normal 
> shuffle and shuffle provided by REBALANCE which shared the same param 
> "spark.sql.adaptive.advisoryPartitionSizeInBytes"
> As we all known, The `REBALANCE` hint can be used to rebalance the query 
> result output partitions, which is only effective when AQE is enabled and 
> will be converted to a ShuffleExchangeExec.
> When both normal shuffles,which are probably provided by SQL itself or extra 
> `REPARTITION` hint and `REBALANCE` hint exist simultaneously, All of them 
> will be converted to ShuffleExchangeExec and then AQE's 
> ShuffleQueryStageExec. Even more exaggerated is that if we change 
> "spark.sql.adaptive.advisoryPartitionSizeInBytes", ALL the Shuffle partitions 
> num and size will be changed! In this scenario,the roles of `REPARTITION` 
> hint and `REBALANCE` hint are similar. It is common for us to modify this 
> parameter to control the number of final output files' num and reduce the 
> issue of small files. So we want to make a config isolation on 
> "spark.sql.adaptive.advisoryPartitionSizeInBytes", which is extremely useful 
> when we use `REBALANCE` hint to control output partitions,AND only effect the 
> Shuffle provided by `REBALANCE` hint.
> In order to achieve this we add an identifier to identify a Shuffle is 
> converted from `REBALANCE` hint. When `REBALANCE` hint is used and AQE is 
> enabled, there will be the following strategies:
> 1.if rebalance's exclusive advisoryPartitionSizeInBytes is configured,Shuffle 
> converted from `REBALANCE` hint will used 
> "spark.sql.adaptive.rebalance.advisoryPartitionSizeInBytes" to control 
> partition size.
> 2.if rebalance's exclusive advisoryPartitionSizeInBytes is not configured, 
> Shuffle converted from `REBALANCE` hint will used the original 
> advisoryPartitionSizeInBytes to control partition size, which has same 
> behavior as before.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to