[ https://issues.apache.org/jira/browse/SPARK-53298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
xuyu updated SPARK-53298: ------------------------- Summary: make an isolation to control Shuffle partitionSizeInBytes converted from `REBALANCE` hint (was: make an isolation to control AQE's `REBALANCE` hint partitionSizeInBytes) > make an isolation to control Shuffle partitionSizeInBytes converted from > `REBALANCE` hint > ------------------------------------------------------------------------------------------ > > Key: SPARK-53298 > URL: https://issues.apache.org/jira/browse/SPARK-53298 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 3.5.0, 4.0.0 > Reporter: xuyu > Priority: Major > Labels: pull-request-available > > The main idea of this issue is that make a config isolation between normal > shuffle and shuffle provided by REBALANCE which shared the same param > "spark.sql.adaptive.advisoryPartitionSizeInBytes" > As we all known, The `REBALANCE` hint can be used to rebalance the query > result output partitions, which is only effective when AQE is enabled and > will be converted to a ShuffleExchangeExec. > When both normal shuffles,which are probably provided by SQL itself or extra > `REPARTITION` hint and `REBALANCE` hint exist simultaneously, All of them > will be converted to ShuffleExchangeExec and then AQE's > ShuffleQueryStageExec. Even more exaggerated is that if we change > "spark.sql.adaptive.advisoryPartitionSizeInBytes", ALL the Shuffle partitions > num and size will be changed! In this scenario,the roles of `REPARTITION` > hint and `REBALANCE` hint are similar. It is common for us to modify this > parameter to control the number of final output files' num and reduce the > issue of small files. So we want to make a config isolation on > "spark.sql.adaptive.advisoryPartitionSizeInBytes", which is extremely useful > when we use `REBALANCE` hint to control output partitions,AND only effect the > Shuffle provided by `REBALANCE` hint. > In order to achieve this we add an identifier to identify a Shuffle is > converted from `REBALANCE` hint. When `REBALANCE` hint is used and AQE is > enabled, there will be the following strategies: > 1.if rebalance's exclusive advisoryPartitionSizeInBytes is configured,Shuffle > converted from `REBALANCE` hint will used > "spark.sql.adaptive.rebalance.advisoryPartitionSizeInBytes" to control > partition size. > 2.if rebalance's exclusive advisoryPartitionSizeInBytes is not configured, > Shuffle converted from `REBALANCE` hint will used the original > advisoryPartitionSizeInBytes to control partition size, which has same > behavior as before. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org