[ 
https://issues.apache.org/jira/browse/SPARK-42784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mridul Muralidharan updated SPARK-42784:
----------------------------------------
    Fix Version/s: 3.3.3
                   3.5.0
                   3.4.2

> Fix the problem of incomplete creation of subdirectories in push merged 
> localDir
> --------------------------------------------------------------------------------
>
>                 Key: SPARK-42784
>                 URL: https://issues.apache.org/jira/browse/SPARK-42784
>             Project: Spark
>          Issue Type: Bug
>          Components: Shuffle, Spark Core
>    Affects Versions: 3.3.2
>            Reporter: Fencheng Mei
>            Assignee: Fencheng Mei
>            Priority: Major
>             Fix For: 3.3.3, 3.5.0, 3.4.2
>
>
> After we massively enabled push-based shuffle in our production environment, 
> we found some warn messages appearing in the server-side log messages.
> the warning log like:
> ShuffleBlockPusher: Pushing block shufflePush_3_0_5352_935 to 
> BlockManagerId(shuffle-push-merger, zw06-data-hdp-dn08251.mt, 7337, None) 
> failed.
> java.lang.RuntimeException: java.lang.RuntimeException: Cannot initialize 
> merged shuffle partition for appId application_1671244879475_44020960 
> shuffleId 3 shuffleMergeId 0 reduceId 935.
> After investigation, we identified the triggering mechanism of the bug。
> The driver requested two different containers on the same physical machine. 
> During the creation of the 'push-merged' directory in the first container 
> (container_1), the mergeDir was created first, then the subDir were created 
> based on the value of the "spark.diskStore.subDirectories" parameter. 
> However, the resources of container_1 were preempted during the creation of 
> the sub-directories, resulting in subDir not being created (only part of it 
> was created ). As the mergeDir still existed, the second container 
> (container_2) was unable to create further subDir (as it assumed that all 
> directories had already been created).
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to