smengcl commented on code in PR #5751:
URL: https://github.com/apache/ozone/pull/5751#discussion_r1423765648
##########
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/snapshot/SnapshotCache.java:
##########
@@ -230,21 +231,19 @@ public ReferenceCounted<IOmMetadataReader, SnapshotCache>
get(String key,
* @param key snapshot table key
*/
public void release(String key) {
- dbMap.compute(key, (k, v) -> {
- if (v == null) {
- throw new IllegalArgumentException(
- "Key '" + key + "' does not exist in cache");
- }
+ ReferenceCounted<IOmMetadataReader, SnapshotCache>
+ rcOmSnapshot = dbMap.get(key);
+ if (rcOmSnapshot == null) {
+ throw new IllegalArgumentException(
+ "Key '" + key + "' does not exist in cache");
+ }
- if (v.decrementRefCount() == 0L) {
- synchronized (pendingEvictionList) {
- // v is eligible to be evicted and closed
- pendingEvictionList.add(v);
- }
+ if (rcOmSnapshot.decrementRefCount() == 0L) {
+ synchronized (pendingEvictionList) {
+ // v is eligible to be evicted and closed
+ pendingEvictionList.add(rcOmSnapshot);
}
-
- return v;
- });
+ }
Review Comment:
I figure I would outline two possible race condition scenarios without the
`dbMap` lock here. Though I believe both are already protected by my existing
implementation.
Case 1:
Since `pendingEvictionList` is a `Set`, it should be fine if multiple
threads are calling `release(key)` as the same time while some of those threads
observed that `rcOmSnapshot`'s ref count reached `0` at the same time. So the
logic here would still be correct in this case.
Case 2: Multiple threads interleaving `get()` and `release()`
1. Thread 1 executes `get(key)` right past
`rcOmSnapshot.incrementRefCount()` (after the change in this PR). ref count is
1 at this point.
2. Thread 2 executes `release(key)` past `rcOmSnapshot.decrementRefCount()`.
ref count is 0 at this point. (*)
3. Thread 2 continues to execute `pendingEvictionList.add(rcOmSnapshot)`,
pendingEvictionList now has the snapshot, if `cacheSizeLimit` is exceeded,
`cleanup()` will pick up the snapshot immediately.
4. Thread 1 executes `pendingEvictionList.remove(rcOmSnapshot)`. snapshot is
(supposed to be) removed from eviction list while the snapshot is already
removed from `dbMap` (cleaned up) by `cleanup()`.
(*) While actually Step 2 in Case 2 above would be prevented because
`decrementRefCount()` throws if a thread tries to `release()` a snapshot it had
not `get()` before, as a safety mechanism. And this safety, together with
atomic ref count should also prevent Case 1 from happening in the first place
(for which multiple threads would observe that ref count reached `0` at the
same time).
`cleanup()`, as of this time, is only called at the end of each `get()` and
`release()`. As it doesn't run in a seperate thread it should not pose extra
correctness issue at least for now.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]