[ 
https://issues.apache.org/jira/browse/SPARK-35546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ye Zhou updated SPARK-35546:
----------------------------
        Parent:     (was: SPARK-33235)
    Issue Type: Bug  (was: Sub-task)

> Enable push-based shuffle when multiple app attempts are enabled and manage 
> concurrent access to the state in a better way 
> ---------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-35546
>                 URL: https://issues.apache.org/jira/browse/SPARK-35546
>             Project: Spark
>          Issue Type: Bug
>          Components: Shuffle
>    Affects Versions: 3.1.0
>            Reporter: Ye Zhou
>            Priority: Major
>
> In the current implementation of RemoteBlockPushResolver, two 
> ConcurrentHashmap are used to store #1 applicationId -> 
> mergedShuffleLocalDirPath #2 applicationId+attemptId+shuffleID -> 
> mergedShuffleParitionInfo. As there are four types of messages: 
> ExecutorRegister, PushBlocks, FinalizeShuffleMerge and ApplicationRemove, 
> will trigger different types of operations within these two hashmaps, it is 
> required to maintain strong consistency about the informations stored in 
> these two hashmaps. Otherwise, either there will be data 
> corruption/correctness issues or memory leak in shuffle server. 
> We should come up with systematic way to resolve this, other than spot fixing 
> the potential issues.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to