smengcl commented on code in PR #5751:
URL: https://github.com/apache/ozone/pull/5751#discussion_r1423765648


##########
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/snapshot/SnapshotCache.java:
##########
@@ -230,21 +231,19 @@ public ReferenceCounted<IOmMetadataReader, SnapshotCache> 
get(String key,
    * @param key snapshot table key
    */
   public void release(String key) {
-    dbMap.compute(key, (k, v) -> {
-      if (v == null) {
-        throw new IllegalArgumentException(
-            "Key '" + key + "' does not exist in cache");
-      }
+    ReferenceCounted<IOmMetadataReader, SnapshotCache>
+        rcOmSnapshot = dbMap.get(key);
+    if (rcOmSnapshot == null) {
+      throw new IllegalArgumentException(
+          "Key '" + key + "' does not exist in cache");
+    }
 
-      if (v.decrementRefCount() == 0L) {
-        synchronized (pendingEvictionList) {
-          // v is eligible to be evicted and closed
-          pendingEvictionList.add(v);
-        }
+    if (rcOmSnapshot.decrementRefCount() == 0L) {
+      synchronized (pendingEvictionList) {
+        // v is eligible to be evicted and closed
+        pendingEvictionList.add(rcOmSnapshot);
       }
-
-      return v;
-    });
+    }

Review Comment:
   I figure I would outline two possible race condition scenarios without the 
`dbMap` lock here. Though I believe both are already protected by my existing 
implementation.
   
   
   Case 1:
   
   Since `pendingEvictionList` is a `Set`, it should be fine if multiple 
threads are calling `release(key)` as the same time while some of those threads 
observed that `rcOmSnapshot`'s ref count reached `0` at the same time. So the 
logic here would still be correct in this case.
   
   
   Case 2: Multiple threads interleaving `get()` and `release()`
   
   1. Thread 1 executes `get(key)` right past 
`rcOmSnapshot.incrementRefCount()` (after the change in this PR). ref count is 
1 at this point.
   2. Thread 2 executes `release(key)` past `rcOmSnapshot.decrementRefCount()`. 
ref count is 0 at this point. (*)
   3. Thread 2 continues to execute `pendingEvictionList.add(rcOmSnapshot)`, 
pendingEvictionList now has the snapshot, if `cacheSizeLimit` is exceeded, 
`cleanup()` will pick up the snapshot immediately.
   4. Thread 1 executes `pendingEvictionList.remove(rcOmSnapshot)`. snapshot is 
(supposed to be) removed from eviction list while the snapshot is already 
removed from `dbMap` (cleaned up) by `cleanup()`.
   
   (*) While actually Step 2 in Case 2 above would be prevented because 
`decrementRefCount()` throws if a thread tries to `release()` a snapshot it had 
not `get()` before, as a safety mechanism. And this safety, together with 
atomic ref count should also prevent Case 1 from happening in the first place 
(for which multiple threads would observe that ref count reached `0` at the 
same time).
   
   
   `cleanup()`, as of this time, is only called at the end of each `get()` and 
`release()`. As it doesn't run in a seperate thread it should not pose extra 
correctness issue at least for now.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to