[jira] [Comment Edited] (SPARK-42912) Some cases do not take effect when using OptimizeSkewInRebalancePartitions

thomasgx (Jira) Thu, 23 Mar 2023 23:53:12 -0700


    [ 
https://issues.apache.org/jira/browse/SPARK-42912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17704479#comment-17704479
 ]


thomasgx edited comment on SPARK-42912 at 3/24/23 6:52 AM:
-----------------------------------------------------------

found this PR

https://issues.apache.org/jira/browse/SPARK-36967


was (Author: JIRAUSER299489):
https://issues.apache.org/jira/browse/SPARK-36967

> Some cases do not take effect when using OptimizeSkewInRebalancePartitions
> --------------------------------------------------------------------------
>
>                 Key: SPARK-42912
>                 URL: https://issues.apache.org/jira/browse/SPARK-42912
>             Project: Spark
>          Issue Type: Question
>          Components: Spark Core
>    Affects Versions: 3.3.0
>         Environment: spark3.3.0
>            Reporter: thomasgx
>            Priority: Major
>         Attachments: image-2023-03-24-11-30-42-239.png, 
> image-2023-03-24-11-31-42-564.png, image-2023-03-24-11-34-34-070.png, 
> image-2023-03-24-11-36-54-539.png, image-2023-03-24-11-37-42-289.png
>
>
> Questioin:
> When using OptimizeSkewInRebalancePartitions to insert dynamic partitions 
> (three-level partitions) into the hive table (partitions are skewed), it is 
> found that when spark.sql.shuffle.partitions is set to a relatively large 
> value (10000), the written results do not follow the preset 
> advisoryPartitionSizeInBytes Size to file (the skewed partition data is only 
> processed by one task and written into one file), but when I reduce 
> spark.sql.shuffle.partitions (2000), I found that the skewed partition can be 
> optimized according to OptimizeSkewInRebalancePartitions Data is processed in 
> batches and written to a file.
>  
> spark aqe config:
> spark.sql.adaptive.coalescePartitions.enabled true
> spark.sql.adaptive.skewedJoin.enabled true
> spark.sql.adaptive.advisoryPartitionSizeInBytes 128M
> spark.sql.finalStage.adaptive.advisoryPartitionSizeInBytes 512M
> spark.sql.finalStage.adaptive.coalescePartitions.minPartitionSize 128M
> spark.sql.finalStage.adaptive.coalescePartitions.parallelismFirst false
> spark.sql.adaptive.skewJoin.skewedPartitionThresholdInBytes  1024M
>  
> 10000 partitions
> !image-2023-03-24-11-30-42-239.png|width=929,height=150!
>  
>  
> 2000 partition:
> !image-2023-03-24-11-31-42-564.png|width=936,height=172!
>  
>  
> sql time
> !image-2023-03-24-11-34-34-070.png|width=962,height=220!
>  
>  
> plan:
> !image-2023-03-24-11-36-54-539.png|width=339,height=389!
>  
>  
>  
> !image-2023-03-24-11-37-42-289.png|width=334,height=306!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (SPARK-42912) Some cases do not take effect when using OptimizeSkewInRebalancePartitions

Reply via email to