[
https://issues.apache.org/jira/browse/SPARK-35546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Chandni Singh updated SPARK-35546:
----------------------------------
Summary: Enable push-based shuffle when multiple app attempts are enabled
and manage concurrent access to the state in a better way (was: Properly
handle race conditions in RemoteBlockPushResolver to support push based shuffle
with multiple app attempts enabled)
> Enable push-based shuffle when multiple app attempts are enabled and manage
> concurrent access to the state in a better way
> ---------------------------------------------------------------------------------------------------------------------------
>
> Key: SPARK-35546
> URL: https://issues.apache.org/jira/browse/SPARK-35546
> Project: Spark
> Issue Type: Sub-task
> Components: Shuffle
> Affects Versions: 3.1.0
> Reporter: Ye Zhou
> Priority: Major
>
> In the current implementation of RemoteBlockPushResolver, two
> ConcurrentHashmap are used to store #1 applicationId ->
> mergedShuffleLocalDirPath #2 applicationId+attemptId+shuffleID ->
> mergedShuffleParitionInfo. As there are four types of messages:
> ExecutorRegister, PushBlocks, FinalizeShuffleMerge and ApplicationRemove,
> will trigger different types of operations within these two hashmaps, it is
> required to maintain strong consistency about the informations stored in
> these two hashmaps. Otherwise, either there will be data
> corruption/correctness issues or memory leak in shuffle server.
> We should come up with systematic way to resolve this, other than spot fixing
> the potential issues.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]