[ 
https://issues.apache.org/jira/browse/HDDS-12210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17931336#comment-17931336
 ] 

Hemant Kumar commented on HDDS-12210:
-------------------------------------

It seems like the same issue as HDDS-12385. I looked at a couple of examples 
one from the description and one another, If you see the snapshot was purged 
successfully in `OMSnapshotPurgeRequest` but when SDS tried to open again while 
DoubleBuffer is getting flushed, the was a failure.

A snapshot is purged successfully in validateAndUpdateCache of 
OMSnapshotPurgeRequest:
{code:java}
2025-02-04 12:09:55,573 INFO [OM StateMachine ApplyTransaction Thread - 
0]-org.apache.hadoop.ozone.om.request.snapshot.OMSnapshotPurgeRequest: 
Successfully executed snapshotPurgeRequest: {snapshotDBKeys: 
"/ota/ozdls3_ota_va14/cm-tmp-15d62c55-9463-4d5f-b724-3817cb623dae"
} along with updating 
snapshots:{/ota/ozdls3_ota_va20/cm-tmp-a03e2fc4-fdeb-4cab-aad7-9723b04ae12d=SnapshotInfo{snapshotId:
 '22028d7f-a1c4-4882-9787-f72dd27f335d', name: 
'cm-tmp-a03e2fc4-fdeb-4cab-aad7-9723b04ae12d', volumeName: 'ota', bucketName: 
'ozdls3_ota_va20', snapshotStatus: 'SNAPSHOT_DELETED', creationTime: 
'1737272460221', deletionTime: '1737275231780', pathPreviousSnapshotId: 
'85137feb-4dff-440e-baf9-c3bdcadfb798', globalPreviousSnapshotId: 
'18f25f59-346b-4e15-b9eb-ba56878579b9', snapshotPath: 'ota/ozdls3_ota_va20', 
checkpointDir: '-22028d7f-a1c4-4882-9787-f72dd27f335d', dbTxSequenceNumber: 
'5468255349', deepClean: 'true', sstFiltered: 'false'}, 
/ota/ozdls3_ota_va14/cm-tmp-22dd7822-4db2-46c6-a122-c1e1a0e96993=SnapshotInfo{snapshotId:
 'dc5984d5-6f4f-4dee-bb49-e80391b2daa1', name: 
'cm-tmp-22dd7822-4db2-46c6-a122-c1e1a0e96993', volumeName: 'ota', bucketName: 
'ozdls3_ota_va14', snapshotStatus: 'SNAPSHOT_DELETED', creationTime: 
'1737298757547', deletionTime: '1737302311966', pathPreviousSnapshotId: 
'8cb8fa9b-0176-4a59-b9cc-ca4cb2838e3b', globalPreviousSnapshotId: 
'fd23933c-8c35-41f3-a61f-bf2bac9a6087', snapshotPath: 'ota/ozdls3_ota_va14', 
checkpointDir: '-dc5984d5-6f4f-4dee-bb49-e80391b2daa1', dbTxSequenceNumber: 
'5479427839', deepClean: 'true', sstFiltered: 'false'}, 
/ota/ozdls3_ota_va14/cm-1546385766-1737773882172-11=SnapshotInfo{snapshotId: 
'62bf6f10-efeb-4112-a359-cf02fb98afd2', name: 'cm-1546385766-1737773882172-11', 
volumeName: 'ota', bucketName: 'ozdls3_ota_va14', snapshotStatus: 
'SNAPSHOT_ACTIVE', creationTime: '1738344122546', deletionTime: '-1', 
pathPreviousSnapshotId: 'a2f77d97-3931-4f39-8afb-f4c354765e0e', 
globalPreviousSnapshotId: '6184ea57-00df-469b-86bb-e9d71ab2e384', snapshotPath: 
'ota/ozdls3_ota_va14', checkpointDir: '-62bf6f10-efeb-4112-a359-cf02fb98afd2', 
dbTxSequenceNumber: '5804642318', deepClean: 'true', sstFiltered: 'false'}, 
/ota/ozdls3_ota_va14/cm-tmp-15d62c55-9463-4d5f-b724-3817cb623dae=SnapshotInfo{snapshotId:
 '60e7673b-6a97-4960-a522-b63cb113d016', name: 
'cm-tmp-15d62c55-9463-4d5f-b724-3817cb623dae', volumeName: 'ota', bucketName: 
'ozdls3_ota_va14', snapshotStatus: 'SNAPSHOT_DELETED', creationTime: 
'1737255564931', deletionTime: '1737259016079', pathPreviousSnapshotId: 
'8cb8fa9b-0176-4a59-b9cc-ca4cb2838e3b', globalPreviousSnapshotId: 
'18f25f59-346b-4e15-b9eb-ba56878579b9', snapshotPath: 'ota/ozdls3_ota_va14', 
checkpointDir: '-60e7673b-6a97-4960-a522-b63cb113d016', dbTxSequenceNumber: 
'5466629436', deepClean: 'true', sstFiltered: 'false'}}.{code}

Later on, SnapshotDeletingService tried to open the same snapshot and failed:
{code:java}
2025-02-04 12:09:55,634 ERROR 
[SnapshotDeletingService#0]-org.apache.hadoop.ozone.om.OmSnapshotManager: 
Failed to retrieve snapshot: 
/ota/ozdls3_ota_va14/cm-tmp-15d62c55-9463-4d5f-b724-3817cb623dae
java.io.IOException: Failed init RocksDB, db path : 
/data/meta01/hadoop-ozone/om/data/db.snapshots/checkpointState/om.db-60e7673b-6a97-4960-a522-b63cb113d016,
 exception :org.rocksdb.RocksDBException Corruption: IO error: No such file or 
directory: While open a file for random read: 
/data/meta01/hadoop-ozone/om/data/db.snapshots/checkpointState/om.db-60e7673b-6a97-4960-a522-b63cb113d016/719149.ldb:
 No such file or directory in file 
/data/meta01/hadoop-ozone/om/data/db.snapshots/checkpointState/om.db-60e7673b-6a97-4960-a522-b63cb113d016/MANIFEST-720881
        at org.apache.hadoop.hdds.utils.db.RDBStore.<init>(RDBStore.java:180)
        at 
org.apache.hadoop.hdds.utils.db.DBStoreBuilder.build(DBStoreBuilder.java:220)
        at 
org.apache.hadoop.ozone.om.OmMetadataManagerImpl.loadDB(OmMetadataManagerImpl.java:589)
        at 
org.apache.hadoop.ozone.om.OmMetadataManagerImpl.<init>(OmMetadataManagerImpl.java:402)
        at 
org.apache.hadoop.ozone.om.OmSnapshotManager$1.load(OmSnapshotManager.java:360)
        at 
org.apache.hadoop.ozone.om.OmSnapshotManager$1.load(OmSnapshotManager.java:1)
        at 
org.apache.hadoop.ozone.om.snapshot.SnapshotCache.lambda$1(SnapshotCache.java:147)
        at 
java.util.concurrent.ConcurrentHashMap.compute(ConcurrentHashMap.java:1853)
        at 
org.apache.hadoop.ozone.om.snapshot.SnapshotCache.get(SnapshotCache.java:143)
        at 
org.apache.hadoop.ozone.om.OmSnapshotManager.checkForSnapshot(OmSnapshotManager.java:616)
        at 
org.apache.hadoop.ozone.om.service.SnapshotDeletingService$SnapshotDeletingTask.call(SnapshotDeletingService.java:169)
        at 
org.apache.hadoop.hdds.utils.BackgroundService$PeriodicalTask.lambda$run$0(BackgroundService.java:121)
        at 
java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1640)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:750){code}
The corresponding DoubleBuffer log:
{code:java}
2025-02-04 12:09:55,921 ERROR 
[OMDoubleBufferFlushThread]-org.apache.hadoop.ozone.om.response.snapshot.OMSnapshotPurgeResponse:
 Failed to delete snapshot directory 
/data/meta01/hadoop-ozone/om/data/db.snapshots/checkpointState/om.db-60e7673b-6a97-4960-a522-b63cb113d016
 for snapshot /ota/ozdls3_ota_va14/cm-tmp-15d62c55-9463-4d5f-b724-3817cb623dae
java.nio.file.DirectoryNotEmptyException: 
/data/meta01/hadoop-ozone/om/data/db.snapshots/checkpointState/om.db-60e7673b-6a97-4960-a522-b63cb113d016
        at 
sun.nio.fs.UnixFileSystemProvider.implDelete(UnixFileSystemProvider.java:242)
        at 
sun.nio.fs.AbstractFileSystemProvider.delete(AbstractFileSystemProvider.java:103)
        at java.nio.file.Files.delete(Files.java:1126)
        at org.apache.commons.io.FileUtils.delete(FileUtils.java:1175)
        at org.apache.commons.io.FileUtils.deleteDirectory(FileUtils.java:1194)
        at 
org.apache.hadoop.ozone.om.response.snapshot.OMSnapshotPurgeResponse.deleteCheckpointDirectory(OMSnapshotPurgeResponse.java:130)
        at 
org.apache.hadoop.ozone.om.response.snapshot.OMSnapshotPurgeResponse.addToDBBatch(OMSnapshotPurgeResponse.java:100)
        at 
org.apache.hadoop.ozone.om.response.OMClientResponse.checkAndUpdateDB(OMClientResponse.java:73)
        at 
org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.lambda$5(OzoneManagerDoubleBuffer.java:382)
        at 
org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.addToBatchWithTrace(OzoneManagerDoubleBuffer.java:220)
        at 
org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.addToBatch(OzoneManagerDoubleBuffer.java:381)
        at 
org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushBatch(OzoneManagerDoubleBuffer.java:324)
        at 
org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushCurrentBuffer(OzoneManagerDoubleBuffer.java:297)
        at 
org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushTransactions(OzoneManagerDoubleBuffer.java:262)
        at java.lang.Thread.run(Thread.java:750){code}
*Another example:*

Successfully in validateAndUpdateCache of OMSnapshotPurgeRequest:
{code:java}
2025-02-04 12:55:32,375 INFO [OM StateMachine ApplyTransaction Thread - 
0]-org.apache.hadoop.ozone.om.request.snapshot.OMSnapshotPurgeRequest: 
Successfully executed snapshotPurgeRequest: {snapshotDBKeys: 
"/ota/ozdls3_ota_va14/cm-tmp-2b51a42e-1ece-4b27-b7f8-4bb7ad550281"
} along with updating 
snapshots:{/ota/ozdls3_ota_va8/cm-1546385728-1735953302412-1=SnapshotInfo{snapshotId:
 '5c847ff0-8f6b-404b-9df9-d9092361cac7', name: 'cm-1546385728-1735953302412-1', 
volumeName: 'ota', bucketName: 'ozdls3_ota_va8', snapshotStatus: 
'SNAPSHOT_DELETED', creationTime: '1736011578605', deletionTime: 
'1736229626552', pathPreviousSnapshotId: 
'b7d61a28-3e57-427f-92fe-f5f4b7258621', globalPreviousSnapshotId: 
'b7d61a28-3e57-427f-92fe-f5f4b7258621', snapshotPath: 'ota/ozdls3_ota_va8', 
checkpointDir: '-5c847ff0-8f6b-404b-9df9-d9092361cac7', dbTxSequenceNumber: 
'5029594305', deepClean: 'true', sstFiltered: 'false'}, 
/ota/ozdls3_ota_va14/cm-tmp-2b51a42e-1ece-4b27-b7f8-4bb7ad550281=SnapshotInfo{snapshotId:
 '6e45f5f1-51b6-4321-8f91-110326aa584d', name: 
'cm-tmp-2b51a42e-1ece-4b27-b7f8-4bb7ad550281', volumeName: 'ota', bucketName: 
'ozdls3_ota_va14', snapshotStatus: 'SNAPSHOT_DELETED', creationTime: 
'1736002766663', deletionTime: '1736002778619', pathPreviousSnapshotId: 
'd7ddb410-d32f-4b43-b2cc-8ac3509ad2de', globalPreviousSnapshotId: 
'b7d61a28-3e57-427f-92fe-f5f4b7258621', snapshotPath: 'ota/ozdls3_ota_va14', 
checkpointDir: '-6e45f5f1-51b6-4321-8f91-110326aa584d', dbTxSequenceNumber: 
'5029222483', deepClean: 'true', sstFiltered: 'false'}, 
/ota/ozdls3_ota_va14/cm-1546385766-1737773882172-11=SnapshotInfo{snapshotId: 
'62bf6f10-efeb-4112-a359-cf02fb98afd2', name: 'cm-1546385766-1737773882172-11', 
volumeName: 'ota', bucketName: 'ozdls3_ota_va14', snapshotStatus: 
'SNAPSHOT_ACTIVE', creationTime: '1738344122546', deletionTime: '-1', 
pathPreviousSnapshotId: 'a2f77d97-3931-4f39-8afb-f4c354765e0e', 
globalPreviousSnapshotId: '6184ea57-00df-469b-86bb-e9d71ab2e384', snapshotPath: 
'ota/ozdls3_ota_va14', checkpointDir: '-62bf6f10-efeb-4112-a359-cf02fb98afd2', 
dbTxSequenceNumber: '5804642318', deepClean: 'true', sstFiltered: 'false'}, 
/ota/ozdls3_ota_va14/cm-tmp-7aff84ce-4530-4011-ba4c-8be8a192a437=SnapshotInfo{snapshotId:
 'e90f5cbf-4a4c-4111-b329-663f49c6bf1e', name: 
'cm-tmp-7aff84ce-4530-4011-ba4c-8be8a192a437', volumeName: 'ota', bucketName: 
'ozdls3_ota_va14', snapshotStatus: 'SNAPSHOT_DELETED', creationTime: 
'1736045969833', deletionTime: '1736045981689', pathPreviousSnapshotId: 
'd7ddb410-d32f-4b43-b2cc-8ac3509ad2de', globalPreviousSnapshotId: 
'1fe28717-9fa9-462b-9bcb-1bda77e865b3', snapshotPath: 'ota/ozdls3_ota_va14', 
checkpointDir: '-e90f5cbf-4a4c-4111-b329-663f49c6bf1e', dbTxSequenceNumber: 
'5035116883', deepClean: 'true', sstFiltered: 'false'}}. {code}
SnapshotDeletingService log:
{code:java}
2025-02-04 12:55:32,417 ERROR 
[SnapshotDeletingService#0]-org.apache.hadoop.ozone.om.OmSnapshotManager: 
Failed to retrieve snapshot: 
/ota/ozdls3_ota_va14/cm-tmp-2b51a42e-1ece-4b27-b7f8-4bb7ad550281
java.io.IOException: Failed init RocksDB, db path : 
/data/meta01/hadoop-ozone/om/data/db.snapshots/checkpointState/om.db-6e45f5f1-51b6-4321-8f91-110326aa584d,
 exception :org.rocksdb.RocksDBException Corruption: IO error: No such file or 
directory: While open a file for random read: 
/data/meta01/hadoop-ozone/om/data/db.snapshots/checkpointState/om.db-6e45f5f1-51b6-4321-8f91-110326aa584d/685604.ldb:
 No such file or directory in file 
/data/meta01/hadoop-ozone/om/data/db.snapshots/checkpointState/om.db-6e45f5f1-51b6-4321-8f91-110326aa584d/MANIFEST-688430
        at org.apache.hadoop.hdds.utils.db.RDBStore.<init>(RDBStore.java:180)
        at 
org.apache.hadoop.hdds.utils.db.DBStoreBuilder.build(DBStoreBuilder.java:220)
        at 
org.apache.hadoop.ozone.om.OmMetadataManagerImpl.loadDB(OmMetadataManagerImpl.java:589)
        at 
org.apache.hadoop.ozone.om.OmMetadataManagerImpl.<init>(OmMetadataManagerImpl.java:402)
        at 
org.apache.hadoop.ozone.om.OmSnapshotManager$1.load(OmSnapshotManager.java:360)
        at 
org.apache.hadoop.ozone.om.OmSnapshotManager$1.load(OmSnapshotManager.java:1)
        at 
org.apache.hadoop.ozone.om.snapshot.SnapshotCache.lambda$1(SnapshotCache.java:147)
        at 
java.util.concurrent.ConcurrentHashMap.compute(ConcurrentHashMap.java:1892)
        at 
org.apache.hadoop.ozone.om.snapshot.SnapshotCache.get(SnapshotCache.java:143)
        at 
org.apache.hadoop.ozone.om.OmSnapshotManager.checkForSnapshot(OmSnapshotManager.java:616)
        at 
org.apache.hadoop.ozone.om.service.SnapshotDeletingService$SnapshotDeletingTask.call(SnapshotDeletingService.java:169)
        at 
org.apache.hadoop.hdds.utils.BackgroundService$PeriodicalTask.lambda$run$0(BackgroundService.java:121)
        at 
java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1640)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:750) {code}
DoubleBuffer log:
{code:java}
2025-02-04 12:55:32,666 ERROR 
[OMDoubleBufferFlushThread]-org.apache.hadoop.ozone.om.response.snapshot.OMSnapshotPurgeResponse:
 Failed to delete snapshot directory 
/data/meta01/hadoop-ozone/om/data/db.snapshots/checkpointState/om.db-6e45f5f1-51b6-4321-8f91-110326aa584d
 for snapshot /ota/ozdls3_ota_va14/cm-tmp-2b51a42e-1ece-4b27-b7f8-4bb7ad550281
java.nio.file.DirectoryNotEmptyException: 
/data/meta01/hadoop-ozone/om/data/db.snapshots/checkpointState/om.db-6e45f5f1-51b6-4321-8f91-110326aa584d
        at 
sun.nio.fs.UnixFileSystemProvider.implDelete(UnixFileSystemProvider.java:242)
        at 
sun.nio.fs.AbstractFileSystemProvider.delete(AbstractFileSystemProvider.java:103)
        at java.nio.file.Files.delete(Files.java:1126)
        at org.apache.commons.io.FileUtils.delete(FileUtils.java:1175)
        at org.apache.commons.io.FileUtils.deleteDirectory(FileUtils.java:1194)
        at 
org.apache.hadoop.ozone.om.response.snapshot.OMSnapshotPurgeResponse.deleteCheckpointDirectory(OMSnapshotPurgeResponse.java:130)
        at 
org.apache.hadoop.ozone.om.response.snapshot.OMSnapshotPurgeResponse.addToDBBatch(OMSnapshotPurgeResponse.java:100)
        at 
org.apache.hadoop.ozone.om.response.OMClientResponse.checkAndUpdateDB(OMClientResponse.java:73)
        at 
org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.lambda$5(OzoneManagerDoubleBuffer.java:382)
        at 
org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.addToBatchWithTrace(OzoneManagerDoubleBuffer.java:220){code}

> Tarball creation interfering with snapshot purge 
> -------------------------------------------------
>
>                 Key: HDDS-12210
>                 URL: https://issues.apache.org/jira/browse/HDDS-12210
>             Project: Apache Ozone
>          Issue Type: Sub-task
>            Reporter: Hemant Kumar
>            Assignee: Hemant Kumar
>            Priority: Major
>
> If tarball creation is in the process while the snapshot is getting purged, 
> it fails the snapshot db dir delete command. Because of that snapshot db dir 
> lingers around even tho snapshot is purged form the snapshotInfoTable and 
> needs manual intervention to delete the dir. 
> {code}
> 2025-02-04 12:09:55,921 ERROR 
> [OMDoubleBufferFlushThread]-org.apache.hadoop.ozone.om.response.snapshot.OMSnapshotPurgeResponse:
>  Failed to delete snapshot directory 
> /data/meta01/hadoop-ozone/om/data/db.snapshots/checkpointState/om.db-60e7673b-6a97-4960-a522-b63cb113d016
>  for snapshot /ota/ozdls3_ota_va14/cm-tmp-15d62c55-9463-4d5f-b724-3817cb623dae
> java.nio.file.DirectoryNotEmptyException: 
> /data/meta01/hadoop-ozone/om/data/db.snapshots/checkpointState/om.db-60e7673b-6a97-4960-a522-b63cb113d016
>         at 
> sun.nio.fs.UnixFileSystemProvider.implDelete(UnixFileSystemProvider.java:242)
>         at 
> sun.nio.fs.AbstractFileSystemProvider.delete(AbstractFileSystemProvider.java:103)
>         at java.nio.file.Files.delete(Files.java:1126)
>         at org.apache.commons.io.FileUtils.delete(FileUtils.java:1175)
>         at 
> org.apache.commons.io.FileUtils.deleteDirectory(FileUtils.java:1194)
>         at 
> org.apache.hadoop.ozone.om.response.snapshot.OMSnapshotPurgeResponse.deleteCheckpointDirectory(OMSnapshotPurgeResponse.java:130)
>         at 
> org.apache.hadoop.ozone.om.response.snapshot.OMSnapshotPurgeResponse.addToDBBatch(OMSnapshotPurgeResponse.java:100)
>         at 
> org.apache.hadoop.ozone.om.response.OMClientResponse.checkAndUpdateDB(OMClientResponse.java:73)
>         at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.lambda$5(OzoneManagerDoubleBuffer.java:382)
>         at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.addToBatchWithTrace(OzoneManagerDoubleBuffer.java:220)
>         at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.addToBatch(OzoneManagerDoubleBuffer.java:381)
>         at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushBatch(OzoneManagerDoubleBuffer.java:324)
>         at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushCurrentBuffer(OzoneManagerDoubleBuffer.java:297)
>         at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushTransactions(OzoneManagerDoubleBuffer.java:262)
>         at java.lang.Thread.run(Thread.java:750)
> {code}
> This task is to come up with an approach to get rid of manual intervention.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to