Ye Zhou created SPARK-35546:
-------------------------------

             Summary: Handling race condition and memory leak in 
RemoteBlockPushResolver
                 Key: SPARK-35546
                 URL: https://issues.apache.org/jira/browse/SPARK-35546
             Project: Spark
          Issue Type: Sub-task
          Components: Shuffle
    Affects Versions: 3.1.0
            Reporter: Ye Zhou


In the current implementation of RemoteBlockPushResolver, two ConcurrentHashmap 
are used to store #1 applicationId -> mergedShuffleLocalDirPath #2 
applicationId+attemptId+shuffleID -> mergedShuffleParitionInfo. As there are 
four types of messages: ExecutorRegister, PushBlocks, FinalizeShuffleMerge and 
ApplicationRemove, will trigger different types of operations within these two 
hashmaps, it is required to maintain strong consistency about the informations 
stored in these two hashmaps. Otherwise, either there will be data 
corruption/correctness issues or memory leak in shuffle server. 

We should come up with systematic way to resolve this, other than spot fixing 
the potential issues.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to