[ 
https://issues.apache.org/jira/browse/HDDS-11470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17883060#comment-17883060
 ] 

Tsz-wo Sze commented on HDDS-11470:
-----------------------------------

Below is an example that om2 replied "Completed INSTALL_SNAPSHOT" but it 
actually had failed to move downloaded DB checkpoint due to a UnixException 
"Invalid cross-device link".
{code}
2024-09-18 09:28:20,365 INFO 
[grpc-default-executor-1]-org.apache.ratis.grpc.server.GrpcServerProtocolService:
 om2: Completed INSTALL_SNAPSHOT, lastReply: null
2024-09-18 09:28:20,365 INFO 
[pool-33-thread-1]-org.apache.hadoop.ozone.om.OzoneManager: metadataManager is 
stopped. Spend 7 ms.
2024-09-18 09:28:20,367 ERROR 
[pool-33-thread-1]-org.apache.hadoop.ozone.om.OzoneManager: Failed to move 
downloaded DB checkpoint /var/lib/hadoop-ozone/om/ozone-metaot/om.db.candidate 
to metadata directory /ozone/hadoop-ozone/om/data/om.db. Exception: {}. 
Resetting to original DB.
java.nio.file.FileSystemException: /ozone/hadoop-ozone/om/data/om.db/000044.sst 
-> /var/lib/hadoop-ozone/om/ozone-metadata/snapshot/om.db.candidate/000044.sst: 
Invalid cross-device link
        at 
java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:100)
        at 
java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111)
        at 
java.base/sun.nio.fs.UnixFileSystemProvider.createLink(UnixFileSystemProvider.java:477)
        at java.base/java.nio.file.Files.createLink(Files.java:1101)
        at 
org.apache.hadoop.ozone.om.snapshot.OmSnapshotUtils.linkFiles(OmSnapshotUtils.java:169)
        at 
org.apache.hadoop.ozone.om.OzoneManager.moveCheckpointFiles(OzoneManager.java:3884)
        at 
org.apache.hadoop.ozone.om.OzoneManager.replaceOMDBWithCheckpoint(OzoneManager.java:3864)
        at 
org.apache.hadoop.ozone.om.OzoneManager.installCheckpoint(OzoneManager.java:3738)
        at 
org.apache.hadoop.ozone.om.OzoneManager.installCheckpoint(OzoneManager.java:3673)
        at 
org.apache.hadoop.ozone.om.OzoneManager.installSnapshotFromLeader(OzoneManager.java:3650)
        at 
org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.lambda$5(OzoneManagerStateMachine.java:505)
        at 
java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:834)
{code}


> OM should not reply Completed INSTALL_SNAPSHOT when installCheckpoint failed
> ----------------------------------------------------------------------------
>
>                 Key: HDDS-11470
>                 URL: https://issues.apache.org/jira/browse/HDDS-11470
>             Project: Apache Ozone
>          Issue Type: Bug
>          Components: OM HA
>            Reporter: Tsz-wo Sze
>            Priority: Major
>
> When OM failed to installCheckpoint (e.g. HDDS-10300), it should not reply 
> "Completed INSTALL_SNAPSHOT".
> In the code below, when there is an exception, it just print an error message 
> and continue to reply "Completed INSTALL_SNAPSHOT".
> {code}
> //OzoneManager.installCheckpoint
>       try {
>         time = Time.monotonicNow();
>         dbBackup = replaceOMDBWithCheckpoint(lastAppliedIndex,
>             oldDBLocation, checkpointLocation);
>         term = checkpointTrxnInfo.getTerm();
>         lastAppliedIndex = checkpointTrxnInfo.getTransactionIndex();
>         LOG.info("Replaced DB with checkpoint from OM: {}, term: {}, " +
>             "index: {}, time: {} ms", leaderId, term, lastAppliedIndex,
>             Time.monotonicNow() - time);
>       } catch (Exception e) {
>         LOG.error("Failed to install Snapshot from {} as OM failed to 
> replace" +
>             " DB with downloaded checkpoint. Reloading old OM state.",
>             leaderId, e);
>       }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to