Re: [PR] HDDS-13768. OM should acquire snapshot cache lock before taking checkpoint. [ozone]

via GitHub Sat, 18 Oct 2025 04:51:16 -0700


swamirishi commented on code in PR #9129:
URL: https://github.com/apache/ozone/pull/9129#discussion_r2427518842



##########
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OMDBCheckpointServletInodeBasedXfer.java:
##########
@@ -288,24 +294,17 @@ private void transferSnapshotData(Set<String> 
sstFilesToExclude, Path tmpdir, Se
       AtomicLong maxTotalSstSize, ArchiveOutputStream<TarArchiveEntry> 
archiveOutputStream,
       Map<String, String> hardLinkFileMap) throws IOException {
     OzoneManager om = (OzoneManager) 
getServletContext().getAttribute(OzoneConsts.OM_CONTEXT_ATTRIBUTE);
-    OMMetadataManager omMetadataManager = om.getMetadataManager();
     for (Path snapshotDir : snapshotPaths) {
       String snapshotId = 
OmSnapshotManager.extractSnapshotIDFromCheckpointDirName(snapshotDir.toString());
-      omMetadataManager.getLock().acquireReadLock(SNAPSHOT_DB_LOCK, 
snapshotId);
-      try {
-        // invalidate closes the snapshot DB
-        
om.getOmSnapshotManager().invalidateCacheEntry(UUID.fromString(snapshotId));
-        writeDBToArchive(sstFilesToExclude, snapshotDir, maxTotalSstSize, 
archiveOutputStream, tmpdir,
-            hardLinkFileMap, false);
-        Path snapshotLocalPropertyYaml = Paths.get(
-            OmSnapshotManager.getSnapshotLocalPropertyYamlPath(snapshotDir));
-        if (Files.exists(snapshotLocalPropertyYaml)) {
-          File yamlFile = snapshotLocalPropertyYaml.toFile();
-          hardLinkFileMap.put(yamlFile.getAbsolutePath(), yamlFile.getName());
-          linkAndIncludeFile(yamlFile, yamlFile.getName(), 
archiveOutputStream, tmpdir);
-        }
-      } finally {
-        omMetadataManager.getLock().releaseReadLock(SNAPSHOT_DB_LOCK, 
snapshotId);
+      // invalidate closes the snapshot DB
+      
om.getOmSnapshotManager().invalidateCacheEntry(UUID.fromString(snapshotId));

Review Comment:
   Please remove this. We should not ever call this. This can lead unknown read 
errors on application side. This is redundant given that we have already taken 
a lock



##########
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OMDBCheckpointServletInodeBasedXfer.java:
##########
@@ -243,22 +245,26 @@ public void writeDbDataToStream(HttpServletRequest 
request, OutputStream destina
       if (shouldContinue) {
         // we finished transferring files from snapshot DB's by now and
         // this is the last step where we transfer the active om.db contents
-        checkpoint = createAndPrepareCheckpoint(tmpdir, true);
-        // unlimited files as we want the Active DB contents to be transferred 
in a single batch
-        maxTotalSstSize.set(Long.MAX_VALUE);
-        Path checkpointDir = checkpoint.getCheckpointLocation();
-        writeDBToArchive(sstFilesToExclude, checkpointDir,
-            maxTotalSstSize, archiveOutputStream, tmpdir, hardLinkFileMap, 
false);
-        if (includeSnapshotData) {
-          Path tmpCompactionLogDir = 
tmpdir.resolve(getCompactionLogDir().getFileName());
-          Path tmpSstBackupDir = 
tmpdir.resolve(getSstBackupDir().getFileName());
-          writeDBToArchive(sstFilesToExclude, tmpCompactionLogDir, 
maxTotalSstSize, archiveOutputStream, tmpdir,
-              hardLinkFileMap, getCompactionLogDir(), false);
-          writeDBToArchive(sstFilesToExclude, tmpSstBackupDir, 
maxTotalSstSize, archiveOutputStream, tmpdir,
-              hardLinkFileMap, getSstBackupDir(), false);
-          // This is done to ensure all data to be copied correctly is flushed 
in the snapshot DB
-          transferSnapshotData(sstFilesToExclude, tmpdir, snapshotPaths, 
maxTotalSstSize,
-              archiveOutputStream, hardLinkFileMap);
+        SnapshotCache snapshotCache = 
om.getOmSnapshotManager().getSnapshotCache();
+        // lock is null when includeSnapshotData=false, no snapshot 
consistency needed
+        try (UncheckedAutoCloseableSupplier<OMLockDetails> lock = 
includeSnapshotData ? snapshotCache.lock() : null) {

Review Comment:
   No we don't need another SNAPSHOT_DB_LOCK read lock that would lead to 
deadlock. SnapshotCacheLock acquires a write lock on the entire stripe



##########
hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/om/TestOMDbCheckpointServletInodeBasedXfer.java:
##########
@@ -377,7 +377,6 @@ public void testSnapshotDBConsistency() throws Exception {
     }
     Path snapshotDbDir = Paths.get(newDbDir.toPath().toString(), 
OM_SNAPSHOT_CHECKPOINT_DIR,
         OM_DB_NAME + "-" + snapshotToModify.getSnapshotId());
-    deleteWalFiles(snapshotDbDir);

Review Comment:
   Same doubt



##########
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OMDBCheckpointServletInodeBasedXfer.java:
##########
@@ -243,22 +245,26 @@ public void writeDbDataToStream(HttpServletRequest 
request, OutputStream destina
       if (shouldContinue) {
         // we finished transferring files from snapshot DB's by now and
         // this is the last step where we transfer the active om.db contents
-        checkpoint = createAndPrepareCheckpoint(tmpdir, true);
-        // unlimited files as we want the Active DB contents to be transferred 
in a single batch
-        maxTotalSstSize.set(Long.MAX_VALUE);
-        Path checkpointDir = checkpoint.getCheckpointLocation();
-        writeDBToArchive(sstFilesToExclude, checkpointDir,
-            maxTotalSstSize, archiveOutputStream, tmpdir, hardLinkFileMap, 
false);
-        if (includeSnapshotData) {
-          Path tmpCompactionLogDir = 
tmpdir.resolve(getCompactionLogDir().getFileName());
-          Path tmpSstBackupDir = 
tmpdir.resolve(getSstBackupDir().getFileName());
-          writeDBToArchive(sstFilesToExclude, tmpCompactionLogDir, 
maxTotalSstSize, archiveOutputStream, tmpdir,
-              hardLinkFileMap, getCompactionLogDir(), false);
-          writeDBToArchive(sstFilesToExclude, tmpSstBackupDir, 
maxTotalSstSize, archiveOutputStream, tmpdir,
-              hardLinkFileMap, getSstBackupDir(), false);
-          // This is done to ensure all data to be copied correctly is flushed 
in the snapshot DB
-          transferSnapshotData(sstFilesToExclude, tmpdir, snapshotPaths, 
maxTotalSstSize,
-              archiveOutputStream, hardLinkFileMap);
+        SnapshotCache snapshotCache = 
om.getOmSnapshotManager().getSnapshotCache();
+        // lock is null when includeSnapshotData=false, no snapshot 
consistency needed

Review Comment:
   Let us modularise this function since from line number 223 to 241 the code 
should be same it is just that we are not going to be acquiring a snapshot 
cache lock and taking AOS db checkpoint. Maybe we can create a refactor jira 
for this



##########
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/snapshot/SnapshotCache.java:
##########
@@ -151,7 +151,6 @@ public void invalidate(UUID key) {
         LOG.warn("SnapshotId: '{}' does not exist in snapshot cache.", k);
       } else {
         try {
-          v.get().getMetadataManager().getStore().flushDB();

Review Comment:
   we don't have to explicitly call flush since db.close() would flush the wal 
to disk before closing



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] HDDS-13768. OM should acquire snapshot cache lock before taking checkpoint. [ozone]

Reply via email to