[
https://issues.apache.org/jira/browse/HDDS-8069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17700432#comment-17700432
]
Hemant Kumar commented on HDDS-8069:
------------------------------------
Problem: As part of jira-HDDS-7873, we added an optimization to early prune out
SST files that won't be needed for compaction DAG base snap diff. More details
in [PR-4235|https://github.com/apache/ozone/pull/4235].
Files are removed from back dir but compaction logs don't get updated because
those are needed for DAG traversal based diffing. On OM restart, we count
number of keys in the file present in compaction log. But it is possible that
file has deleted due to above optimization which is causing OM crash.
Fix: Om restart, log the exception instead throwing exception.
> [Snapshot] Compaction DAG SST cleanup potentially crashing OM on startup
> ------------------------------------------------------------------------
>
> Key: HDDS-8069
> URL: https://issues.apache.org/jira/browse/HDDS-8069
> Project: Apache Ozone
> Issue Type: Sub-task
> Reporter: Dave Teng
> Assignee: Hemant Kumar
> Priority: Blocker
>
> Sometimes if I restart the OM, the OM will go down. I check the OM log and
> saw error:
> {code}
> OM start failed with exception
> java.lang.RuntimeException: Can't find SST file: 000073.sst
> at
> org.apache.ozone.rocksdiff.RocksDBCheckpointDiffer.getAbsoluteSstFilePath(RocksDBCheckpointDiffer.java:561)
> at
> org.apache.ozone.rocksdiff.RocksDBCheckpointDiffer.getSSTFileSummary(RocksDBCheckpointDiffer.java:541)
> at
> org.apache.ozone.rocksdiff.RocksDBCheckpointDiffer.addNodeToDAG(RocksDBCheckpointDiffer.java:1004)
> at
> org.apache.ozone.rocksdiff.RocksDBCheckpointDiffer.lambda$populateCompactionDAG$2(RocksDBCheckpointDiffer.java:1033)
> at
> java.util.concurrent.ConcurrentHashMap.computeIfAbsent(ConcurrentHashMap.java:1660)
> at
> org.apache.ozone.rocksdiff.RocksDBCheckpointDiffer.populateCompactionDAG(RocksDBCheckpointDiffer.java:1032)
> at
> org.apache.ozone.rocksdiff.RocksDBCheckpointDiffer.processCompactionLogLine(RocksDBCheckpointDiffer.java:677)
> at java.util.Iterator.forEachRemaining(Iterator.java:116)
> at
> java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
> at
> java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:647)
> at
> org.apache.ozone.rocksdiff.RocksDBCheckpointDiffer.readCompactionLogToDAG(RocksDBCheckpointDiffer.java:691)
> at
> org.apache.ozone.rocksdiff.RocksDBCheckpointDiffer.loadAllCompactionLogs(RocksDBCheckpointDiffer.java:712)
> at org.apache.hadoop.hdds.utils.db.RDBStore.<init>(RDBStore.java:166)
> at
> org.apache.hadoop.hdds.utils.db.DBStoreBuilder.build(DBStoreBuilder.java:219)
> at
> org.apache.hadoop.ozone.om.OmMetadataManagerImpl.loadDB(OmMetadataManagerImpl.java:481)
> at
> org.apache.hadoop.ozone.om.OmMetadataManagerImpl.loadDB(OmMetadataManagerImpl.java:465)
> at
> org.apache.hadoop.ozone.om.OmMetadataManagerImpl.start(OmMetadataManagerImpl.java:457)
> at
> org.apache.hadoop.ozone.om.OmMetadataManagerImpl.<init>(OmMetadataManagerImpl.java:295)
> at
> org.apache.hadoop.ozone.om.OzoneManager.instantiateServices(OzoneManager.java:743)
> at org.apache.hadoop.ozone.om.OzoneManager.<init>(OzoneManager.java:623)
> at
> org.apache.hadoop.ozone.om.OzoneManager.createOm(OzoneManager.java:708)
> at
> org.apache.hadoop.ozone.om.OzoneManagerStarter$OMStarterHelper.start(OzoneManagerStarter.java:189)
> at
> org.apache.hadoop.ozone.om.OzoneManagerStarter.startOm(OzoneManagerStarter.java:86)
> at
> org.apache.hadoop.ozone.om.OzoneManagerStarter.call(OzoneManagerStarter.java:74)
> at org.apache.hadoop.hdds.cli.GenericCli.call(GenericCli.java:38)
> at picocli.CommandLine.executeUserObject(CommandLine.java:1953)
> at picocli.CommandLine.access$1300(CommandLine.java:145)
> at
> picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2352)
> at picocli.CommandLine$RunLast.handle(CommandLine.java:2346)
> at picocli.CommandLine$RunLast.handle(CommandLine.java:2311)
> at
> picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2179)
> at picocli.CommandLine.execute(CommandLine.java:2078)
> at org.apache.hadoop.hdds.cli.GenericCli.execute(GenericCli.java:100)
> at org.apache.hadoop.hdds.cli.GenericCli.run(GenericCli.java:91)
> at
> org.apache.hadoop.ozone.om.OzoneManagerStarter.main(OzoneManagerStarter.java:58)
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]