RexXiong commented on PR #3100: URL: https://github.com/apache/celeborn/pull/3100#issuecomment-2683807339
> > @aidar-stripe You can refer to the code > > https://github.com/apache/celeborn/blob/f09482108fe91f986b9cfceebd4b050379bf6f64/client/src/main/scala/org/apache/celeborn/client/commit/ReducePartitionCommitHandler.scala#L154-L166 > > > > . > > You will discover that only one thread will write to the partition location set at a time. Here is no concurrent issue. > > That's correct synchronization on `inProcessStageEndShuffleSet` of `ReducePartitionCommitHandler#tryFinalCommit` would ensure that only one thread would complete the commit. The reasoning behind the race here is that: > > * Commit handler (thread 1) calls `collectResult` and populates the `reducerFileGroupsMap: ConcurrentHashMap[Int, ConcurrentHashMap[Integer, util.Set[PartitionLocation]]]` with non-thread safe `HashSet` container > * At the same time, if `handleGetReducerFileGroup` (thread 2) call comes in, it might not see the element in `HashSet` in here https://github.com/apache/celeborn/blob/f09482108fe91f986b9cfceebd4b050379bf6f64/client/src/main/scala/org/apache/celeborn/client/commit/ReducePartitionCommitHandler.scala#L351 > > I think the closest analogy here would be this particular example: https://shipilev.net/blog/2016/close-encounters-of-jmm-kind/#pitfall-volatiles-wrong. I don't think that there's a huge risk of this happening - for it to happen `handleGetReducerFileGroup` would have to come in at the same time as completing commit. > > I'm happy with leaving this as is, since we've added some additional integrity checks on our side. But it feels like changing HashSet to concurrent version should be relatively cheap, especially considering that all the rest of the structures are concurrent. Regardless of the decision - thanks for responding and reviewing! Sounds reasonable, although such situations are rare, it's better to change it to concurrent. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
