[ 
https://issues.apache.org/jira/browse/SPARK-35546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ye Zhou updated SPARK-35546:
----------------------------
    Summary: Properly handle race conditions in RemoteBlockPushResolver for 
access to the internal ConcurrentHashMaps to handle multiple app attempts  
(was: Properly handle race conditions in RemoteBlockPushResolver for access to 
the internal ConcurrentHashMaps)

> Properly handle race conditions in RemoteBlockPushResolver for access to the 
> internal ConcurrentHashMaps to handle multiple app attempts
> ----------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-35546
>                 URL: https://issues.apache.org/jira/browse/SPARK-35546
>             Project: Spark
>          Issue Type: Sub-task
>          Components: Shuffle
>    Affects Versions: 3.1.0
>            Reporter: Ye Zhou
>            Priority: Major
>
> In the current implementation of RemoteBlockPushResolver, two 
> ConcurrentHashmap are used to store #1 applicationId -> 
> mergedShuffleLocalDirPath #2 applicationId+attemptId+shuffleID -> 
> mergedShuffleParitionInfo. As there are four types of messages: 
> ExecutorRegister, PushBlocks, FinalizeShuffleMerge and ApplicationRemove, 
> will trigger different types of operations within these two hashmaps, it is 
> required to maintain strong consistency about the informations stored in 
> these two hashmaps. Otherwise, either there will be data 
> corruption/correctness issues or memory leak in shuffle server. 
> We should come up with systematic way to resolve this, other than spot fixing 
> the potential issues.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to