Ye Zhou created SPARK-35546:
-------------------------------
Summary: Handling race condition and memory leak in
RemoteBlockPushResolver
Key: SPARK-35546
URL: https://issues.apache.org/jira/browse/SPARK-35546
Project: Spark
Issue Type: Sub-task
Components: Shuffle
Affects Versions: 3.1.0
Reporter: Ye Zhou
In the current implementation of RemoteBlockPushResolver, two ConcurrentHashmap
are used to store #1 applicationId -> mergedShuffleLocalDirPath #2
applicationId+attemptId+shuffleID -> mergedShuffleParitionInfo. As there are
four types of messages: ExecutorRegister, PushBlocks, FinalizeShuffleMerge and
ApplicationRemove, will trigger different types of operations within these two
hashmaps, it is required to maintain strong consistency about the informations
stored in these two hashmaps. Otherwise, either there will be data
corruption/correctness issues or memory leak in shuffle server.
We should come up with systematic way to resolve this, other than spot fixing
the potential issues.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]