[ 
https://issues.apache.org/jira/browse/SPARK-35596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17364654#comment-17364654
 ] 

zhengruifeng commented on SPARK-35596:
--------------------------------------

I think I encounted a similar case:

 

in a skewed stage, the matrics of shuffle read size are:

min: 13.2M, 25th: 32.4M, median: 48.2M, 75th: 80.6M, Max:13.2G

 

while in \{OptimizeSkewedJoin}, the statistic shows the left side are even and 
all partitions are of size 82M
 55986 Optimizing skewed join for [cast(batch_id#340 as bigint)], 
[batch_id#466L], LeftOuter. 55987 Left side (canSplit=true) partitions size 
info: 55988 median size: 85996393, max size: 85996393, min size: 85953247, avg 
size: 85995559 55989 Right side (canSplit=false) partitions size info: 55990 
median size: 648168, max size: 648168, min size: 648168, avg size: 648168
 

> HighlyCompressedMapStatus should record accurately the size of skewed shuffle 
> blocks
> ------------------------------------------------------------------------------------
>
>                 Key: SPARK-35596
>                 URL: https://issues.apache.org/jira/browse/SPARK-35596
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 3.0.2, 3.1.2
>            Reporter: exmy
>            Priority: Major
>
> HighlyCompressedMapStatus currently cannot record accurately the size of 
> shuffle blocks which much greater than other block but small than 
> `spark.shuffle.accurateBlockThreshold`, which is likely to lead OOM when 
> fetch shuffle blocks. We have to tune some extra properties like 
> `spark.reducer.maxReqsInFlight` to prevent it, so it is better to fix it in 
> HighlyCompressedMapStatus.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to