Re: [PR] HDDS-10590. [Snapshot] Synchronized snapshot purge, set snapshot property and SstFilteringService [ozone]

via GitHub Tue, 02 Apr 2024 18:14:07 -0700


swamirishi commented on code in PR #6456:
URL: https://github.com/apache/ozone/pull/6456#discussion_r1548656889



##########
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/request/snapshot/OMSnapshotPurgeRequest.java:
##########
@@ -80,33 +87,62 @@ public OMClientResponse validateAndUpdateCache(OzoneManager 
ozoneManager, TermIn
       Map<String, SnapshotInfo> updatedPathPreviousAndGlobalSnapshots =
           new HashMap<>();
 
-      // Snapshots that are purged by the SnapshotDeletingService
-      // will update the next snapshot so that is can be deep cleaned
-      // by the KeyDeletingService in the next run.
+      // Each snapshot purge operation does three things:
+      //  1. Update the snapshot chain,
+      //  2. Update the deep clean flag for the next active snapshot (So that 
it can be
+      //     deep cleaned by the KeyDeletingService in the next run),
+      //  3. Finally, purge the snapshot.
+      // All of these steps have to be performed only when it acquires all the 
necessary
+      // locks (lock on the snapshot to be purged, lock on the next active 
snapshot, and
+      // lock on the next path and global previous snapshots). Ideally, there 
is no need
+      // for locks for snapshot purge and can rely on OMStateMachine because 
OMStateMachine
+      // is going to process each request sequentially.
+      //
+      // But there is a problem with that. After filtering unnecessary SST 
files for a snapshot,
+      // SstFilteringService updates that snapshot's SstFilter flag. 
SstFilteringService cannot
+      // use SetSnapshotProperty API because it runs on each OM independently 
and One OM does
+      // not know if the snapshot has been filtered on the other OM in HA 
environment.
+      //
+      // If locks are not taken snapshot purge and SstFilteringService will 
cause a race condition
+      // and override one's update with another.
       for (String snapTableKey : snapshotDbKeys) {
-        SnapshotInfo fromSnapshot = omMetadataManager.getSnapshotInfoTable()
-            .get(snapTableKey);
-
-        if (fromSnapshot == null) {
-          // Snapshot may have been purged in the previous iteration of 
SnapshotDeletingService.
-          LOG.warn("The snapshot {} is not longer in snapshot table, It maybe 
removed in the previous " +
-                  "Snapshot purge request.", snapTableKey);
-          continue;
-        }
+        // To acquire all the locks, a set is maintained which is keyed by 
snapshotTableKey.
+        // snapshotTableKey is nothing but /volumeName/bucketName/snapshotName.
+        // Once all the locks are acquired, it performs the three steps 
mentioned above and
+        // release all the locks after that.
+        Set<String> lockSet = new HashSet<>();

Review Comment:
   Since we know there are going to be only 2 elements in the set. Should we 
set the initial capacity to 2 and load factor as 1?
   ```suggestion
           Set<String> lockSet = new HashSet<>(2, 1.0);
   ```



##########
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/request/snapshot/OMSnapshotPurgeRequest.java:
##########
@@ -120,20 +156,39 @@ public OMClientResponse 
validateAndUpdateCache(OzoneManager ozoneManager, TermIn
     return omClientResponse;
   }
 
+  private void acquireLock(Set<String> lockSet, String snapshotTableKey,
+                           OMMetadataManager omMetadataManager) throws 
IOException {
+    if (!lockSet.contains(snapshotTableKey)) {
+      Triple<String, String, String> triple = 
SnapshotUtils.getSnapshotTableKeyFromString(snapshotTableKey);

Review Comment:
   Why write another function? We can get this info from snapshotInfo ? 
OmMetadataManager should anyways have all snapshot info in memory.



##########
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/request/snapshot/OMSnapshotPurgeRequest.java:
##########
@@ -80,33 +87,62 @@ public OMClientResponse validateAndUpdateCache(OzoneManager 
ozoneManager, TermIn
       Map<String, SnapshotInfo> updatedPathPreviousAndGlobalSnapshots =
           new HashMap<>();
 
-      // Snapshots that are purged by the SnapshotDeletingService
-      // will update the next snapshot so that is can be deep cleaned
-      // by the KeyDeletingService in the next run.
+      // Each snapshot purge operation does three things:
+      //  1. Update the snapshot chain,
+      //  2. Update the deep clean flag for the next active snapshot (So that 
it can be
+      //     deep cleaned by the KeyDeletingService in the next run),
+      //  3. Finally, purge the snapshot.
+      // All of these steps have to be performed only when it acquires all the 
necessary
+      // locks (lock on the snapshot to be purged, lock on the next active 
snapshot, and
+      // lock on the next path and global previous snapshots). Ideally, there 
is no need
+      // for locks for snapshot purge and can rely on OMStateMachine because 
OMStateMachine
+      // is going to process each request sequentially.
+      //
+      // But there is a problem with that. After filtering unnecessary SST 
files for a snapshot,
+      // SstFilteringService updates that snapshot's SstFilter flag. 
SstFilteringService cannot
+      // use SetSnapshotProperty API because it runs on each OM independently 
and One OM does
+      // not know if the snapshot has been filtered on the other OM in HA 
environment.
+      //
+      // If locks are not taken snapshot purge and SstFilteringService will 
cause a race condition
+      // and override one's update with another.
       for (String snapTableKey : snapshotDbKeys) {
-        SnapshotInfo fromSnapshot = omMetadataManager.getSnapshotInfoTable()
-            .get(snapTableKey);
-
-        if (fromSnapshot == null) {
-          // Snapshot may have been purged in the previous iteration of 
SnapshotDeletingService.
-          LOG.warn("The snapshot {} is not longer in snapshot table, It maybe 
removed in the previous " +
-                  "Snapshot purge request.", snapTableKey);
-          continue;
-        }
+        // To acquire all the locks, a set is maintained which is keyed by 
snapshotTableKey.
+        // snapshotTableKey is nothing but /volumeName/bucketName/snapshotName.
+        // Once all the locks are acquired, it performs the three steps 
mentioned above and
+        // release all the locks after that.
+        Set<String> lockSet = new HashSet<>();
+        try {
+          acquireLock(lockSet, snapTableKey, omMetadataManager);

Review Comment:
   It would be better we take lock in the sorted order of snapTableKey



##########
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/request/snapshot/OMSnapshotPurgeRequest.java:
##########
@@ -171,58 +226,63 @@ private void updateSnapshotChainAndCache(
       return;
     }
 
-    // Updates next path snapshot's previous snapshot ID
+    String nextPathSnapshotKey = null;
+
     if (hasNextPathSnapshot) {
       UUID nextPathSnapshotId = snapshotChainManager.nextPathSnapshot(
           snapInfo.getSnapshotPath(), snapInfo.getSnapshotId());
-
-      String snapshotTableKey = snapshotChainManager
+      nextPathSnapshotKey = snapshotChainManager
           .getTableKey(nextPathSnapshotId);
-      nextPathSnapInfo = metadataManager.getSnapshotInfoTable()
-          .get(snapshotTableKey);
-      if (nextPathSnapInfo != null) {
-        nextPathSnapInfo.setPathPreviousSnapshotId(
-            snapInfo.getPathPreviousSnapshotId());
-        metadataManager.getSnapshotInfoTable().addCacheEntry(
-            new CacheKey<>(nextPathSnapInfo.getTableKey()),
-            CacheValue.get(trxnLogIndex, nextPathSnapInfo));
-        updatedPathPreviousAndGlobalSnapshots
-            .put(nextPathSnapInfo.getTableKey(), nextPathSnapInfo);
-      }
+
+      // Acquire lock from the snapshot
+      acquireLock(lockSet, nextPathSnapshotKey, metadataManager);
     }
 
-    // Updates next global snapshot's previous snapshot ID
+    String nestGlobalSnapshotKey = null;
     if (hasNextGlobalSnapshot) {
-      UUID nextGlobalSnapshotId =
-          snapshotChainManager.nextGlobalSnapshot(snapInfo.getSnapshotId());
-
-      String snapshotTableKey = snapshotChainManager
-          .getTableKey(nextGlobalSnapshotId);
-
-      SnapshotInfo nextGlobalSnapInfo = metadataManager.getSnapshotInfoTable()
-          .get(snapshotTableKey);
-      // If both next global and path snapshot are same, it may overwrite
-      // nextPathSnapInfo.setPathPreviousSnapshotID(), adding this check
-      // will prevent it.
-      if (nextGlobalSnapInfo != null && nextPathSnapInfo != null &&
-          nextGlobalSnapInfo.getSnapshotId().equals(
-              nextPathSnapInfo.getSnapshotId())) {
-        nextPathSnapInfo.setGlobalPreviousSnapshotId(
-            snapInfo.getGlobalPreviousSnapshotId());
-        metadataManager.getSnapshotInfoTable().addCacheEntry(
-            new CacheKey<>(nextPathSnapInfo.getTableKey()),
-            CacheValue.get(trxnLogIndex, nextPathSnapInfo));
-        updatedPathPreviousAndGlobalSnapshots
-            .put(nextPathSnapInfo.getTableKey(), nextPathSnapInfo);
-      } else if (nextGlobalSnapInfo != null) {
-        nextGlobalSnapInfo.setGlobalPreviousSnapshotId(
-            snapInfo.getGlobalPreviousSnapshotId());
-        metadataManager.getSnapshotInfoTable().addCacheEntry(
-            new CacheKey<>(nextGlobalSnapInfo.getTableKey()),
-            CacheValue.get(trxnLogIndex, nextGlobalSnapInfo));
-        updatedPathPreviousAndGlobalSnapshots
-            .put(nextGlobalSnapInfo.getTableKey(), nextGlobalSnapInfo);
-      }
+      UUID nextGlobalSnapshotId = 
snapshotChainManager.nextGlobalSnapshot(snapInfo.getSnapshotId());
+      nestGlobalSnapshotKey = 
snapshotChainManager.getTableKey(nextGlobalSnapshotId);

Review Comment:
   nextGlobalSnapshotKey*



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] HDDS-10590. [Snapshot] Synchronized snapshot purge, set snapshot property and SstFilteringService [ozone]

Reply via email to