RexXiong commented on PR #3100:
URL: https://github.com/apache/celeborn/pull/3100#issuecomment-2683807339

   > > @aidar-stripe You can refer to the code
   > > 
https://github.com/apache/celeborn/blob/f09482108fe91f986b9cfceebd4b050379bf6f64/client/src/main/scala/org/apache/celeborn/client/commit/ReducePartitionCommitHandler.scala#L154-L166
   > > 
   > > .
   > > You will discover that only one thread will write to the partition 
location set at a time. Here is no concurrent issue.
   > 
   > That's correct synchronization on `inProcessStageEndShuffleSet` of 
`ReducePartitionCommitHandler#tryFinalCommit` would ensure that only one thread 
would complete the commit. The reasoning behind the race here is that:
   > 
   > * Commit handler (thread 1) calls `collectResult` and populates the 
`reducerFileGroupsMap: ConcurrentHashMap[Int, ConcurrentHashMap[Integer, 
util.Set[PartitionLocation]]]`  with non-thread safe `HashSet` container
   > * At the same time, if `handleGetReducerFileGroup` (thread 2) call comes 
in, it might not see the element in `HashSet`  in here 
https://github.com/apache/celeborn/blob/f09482108fe91f986b9cfceebd4b050379bf6f64/client/src/main/scala/org/apache/celeborn/client/commit/ReducePartitionCommitHandler.scala#L351
   > 
   > I think the closest analogy here would be this particular example: 
https://shipilev.net/blog/2016/close-encounters-of-jmm-kind/#pitfall-volatiles-wrong.
 I don't think that there's a huge risk of this happening - for it to happen 
`handleGetReducerFileGroup` would have to come in at the same time as 
completing commit.
   > 
   > I'm happy with leaving this as is, since we've added some additional 
integrity checks on our side. But it feels like changing HashSet to concurrent 
version should be relatively cheap, especially considering that all the rest of 
the structures are concurrent. Regardless of the decision - thanks for 
responding and reviewing!
   
   Sounds reasonable, although such situations are rare, it's better to change 
it to concurrent.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to