wangshengjie123 commented on code in PR #2532:
URL: https://github.com/apache/celeborn/pull/2532#discussion_r1643716325


##########
client/src/main/scala/org/apache/celeborn/client/ChangePartitionManager.scala:
##########
@@ -151,15 +170,22 @@ class ChangePartitionManager(
       oldPartition,
       cause)
 
-    requests.synchronized {
-      if (requests.containsKey(partitionId)) {
-        requests.get(partitionId).add(changePartition)
-        logTrace(s"[handleRequestPartitionLocation] For $shuffleId, request 
for same partition" +
-          s"$partitionId-$oldEpoch exists, register context.")
-        return
+    val locksForShuffle = locks.computeIfAbsent(shuffleId, locksRegisterFunc)
+    locksForShuffle(partitionId % locksForShuffle.length).synchronized {
+      var newEntry = false
+      val set = requests.computeIfAbsent(
+        partitionId,
+        new java.util.function.Function[Integer, 
util.Set[ChangePartitionRequest]] {
+          override def apply(t: Integer): util.Set[ChangePartitionRequest] = {
+            newEntry = true
+            new util.HashSet[ChangePartitionRequest]()
+          }
+        })
+
+      if (newEntry) {
+        logTrace(s"[handleRequestPartitionLocation] For $shuffleId, register 
request for " +
+          s"partition $partitionId-$oldEpoch")
       } else {
-        // If new slot for the partition has been allocated, reply and return.
-        // Else register and allocate for it.
         getLatestPartition(shuffleId, partitionId, oldEpoch).foreach { 
latestLoc =>

Review Comment:
   @FMX Just like @RexXiong's comment? It will cause Revive request timeout 
those epoch bigger than 0.
   
   There is two fix choice:
   
   - allow duplicate revive request with same epoch and check in 
`handleRequestPartitions`
   - dont put empty partition record into `requests`
   ```
   val set = requests.computeIfAbsent(
           partitionId,
           new java.util.function.Function[Integer, 
util.Set[ChangePartitionRequest]] {
             override def apply(t: Integer): util.Set[ChangePartitionRequest] = 
{
               newEntry = true
               new util.HashSet[ChangePartitionRequest]()
             }
           })
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to