[
https://issues.apache.org/jira/browse/SPARK-42912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
thomasgx closed SPARK-42912.
----------------------------
> Some cases do not take effect when using OptimizeSkewInRebalancePartitions
> --------------------------------------------------------------------------
>
> Key: SPARK-42912
> URL: https://issues.apache.org/jira/browse/SPARK-42912
> Project: Spark
> Issue Type: Question
> Components: Spark Core
> Affects Versions: 3.3.0
> Environment: spark3.3.0
> Reporter: thomasgx
> Priority: Major
> Attachments: image-2023-03-24-11-30-42-239.png,
> image-2023-03-24-11-31-42-564.png, image-2023-03-24-11-34-34-070.png,
> image-2023-03-24-11-36-54-539.png, image-2023-03-24-11-37-42-289.png
>
>
> Questioin:
> When using OptimizeSkewInRebalancePartitions to insert dynamic partitions
> (three-level partitions) into the hive table (partitions are skewed), it is
> found that when spark.sql.shuffle.partitions is set to a relatively large
> value (10000), the written results do not follow the preset
> advisoryPartitionSizeInBytes Size to file (the skewed partition data is only
> processed by one task and written into one file), but when I reduce
> spark.sql.shuffle.partitions (2000), I found that the skewed partition can be
> optimized according to OptimizeSkewInRebalancePartitions Data is processed in
> batches and written to a file.
>
> spark aqe config:
> spark.sql.adaptive.coalescePartitions.enabled true
> spark.sql.adaptive.skewedJoin.enabled true
> spark.sql.adaptive.advisoryPartitionSizeInBytes 128M
> spark.sql.finalStage.adaptive.advisoryPartitionSizeInBytes 512M
> spark.sql.finalStage.adaptive.coalescePartitions.minPartitionSize 128M
> spark.sql.finalStage.adaptive.coalescePartitions.parallelismFirst false
> spark.sql.adaptive.skewJoin.skewedPartitionThresholdInBytes 1024M
>
> 10000 partitions
> !image-2023-03-24-11-30-42-239.png|width=929,height=150!
>
>
> 2000 partition:
> !image-2023-03-24-11-31-42-564.png|width=936,height=172!
>
>
> sql time
> !image-2023-03-24-11-34-34-070.png|width=962,height=220!
>
>
> plan:
> !image-2023-03-24-11-36-54-539.png|width=339,height=389!
>
>
>
> !image-2023-03-24-11-37-42-289.png|width=334,height=306!
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]