[ 
https://issues.apache.org/jira/browse/HDDS-8069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17700432#comment-17700432
 ] 

Hemant Kumar edited comment on HDDS-8069 at 3/15/23 12:08 AM:
--------------------------------------------------------------

Problem: As part of jira-HDDS-7873, we added an optimization to early prune out 
SST files that won't be needed for compaction DAG base snap diff. More details 
in [PR-4235|https://github.com/apache/ozone/pull/4235].
Files are removed from back dir but compaction logs don't get updated because 
those are needed for DAG traversal based diffing. On OM restart, we count 
number of keys in the file present in compaction log. But it is possible that 
file has been deleted due to above optimization which is causing OM crash.

Fix: Om restart, log the exception instead throwing exception.


was (Author: JIRAUSER297350):
Problem: As part of jira-HDDS-7873, we added an optimization to early prune out 
SST files that won't be needed for compaction DAG base snap diff. More details 
in [PR-4235|https://github.com/apache/ozone/pull/4235].
Files are removed from back dir but compaction logs don't get updated because 
those are needed for DAG traversal based diffing. On OM restart, we count 
number of keys in the file present in compaction log. But it is possible that 
file has deleted due to above optimization which is causing OM crash.

Fix: Om restart, log the exception instead throwing exception.

> [Snapshot] Compaction DAG SST cleanup potentially crashing OM on startup
> ------------------------------------------------------------------------
>
>                 Key: HDDS-8069
>                 URL: https://issues.apache.org/jira/browse/HDDS-8069
>             Project: Apache Ozone
>          Issue Type: Sub-task
>            Reporter: Dave Teng
>            Assignee: Hemant Kumar
>            Priority: Blocker
>
> Sometimes if I restart the OM, the OM will go down. I check the OM log and 
> saw error:
> {code}
> OM start failed with exception
> java.lang.RuntimeException: Can't find SST file: 000073.sst
>       at 
> org.apache.ozone.rocksdiff.RocksDBCheckpointDiffer.getAbsoluteSstFilePath(RocksDBCheckpointDiffer.java:561)
>       at 
> org.apache.ozone.rocksdiff.RocksDBCheckpointDiffer.getSSTFileSummary(RocksDBCheckpointDiffer.java:541)
>       at 
> org.apache.ozone.rocksdiff.RocksDBCheckpointDiffer.addNodeToDAG(RocksDBCheckpointDiffer.java:1004)
>       at 
> org.apache.ozone.rocksdiff.RocksDBCheckpointDiffer.lambda$populateCompactionDAG$2(RocksDBCheckpointDiffer.java:1033)
>       at 
> java.util.concurrent.ConcurrentHashMap.computeIfAbsent(ConcurrentHashMap.java:1660)
>       at 
> org.apache.ozone.rocksdiff.RocksDBCheckpointDiffer.populateCompactionDAG(RocksDBCheckpointDiffer.java:1032)
>       at 
> org.apache.ozone.rocksdiff.RocksDBCheckpointDiffer.processCompactionLogLine(RocksDBCheckpointDiffer.java:677)
>       at java.util.Iterator.forEachRemaining(Iterator.java:116)
>       at 
> java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
>       at 
> java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:647)
>       at 
> org.apache.ozone.rocksdiff.RocksDBCheckpointDiffer.readCompactionLogToDAG(RocksDBCheckpointDiffer.java:691)
>       at 
> org.apache.ozone.rocksdiff.RocksDBCheckpointDiffer.loadAllCompactionLogs(RocksDBCheckpointDiffer.java:712)
>       at org.apache.hadoop.hdds.utils.db.RDBStore.<init>(RDBStore.java:166)
>       at 
> org.apache.hadoop.hdds.utils.db.DBStoreBuilder.build(DBStoreBuilder.java:219)
>       at 
> org.apache.hadoop.ozone.om.OmMetadataManagerImpl.loadDB(OmMetadataManagerImpl.java:481)
>       at 
> org.apache.hadoop.ozone.om.OmMetadataManagerImpl.loadDB(OmMetadataManagerImpl.java:465)
>       at 
> org.apache.hadoop.ozone.om.OmMetadataManagerImpl.start(OmMetadataManagerImpl.java:457)
>       at 
> org.apache.hadoop.ozone.om.OmMetadataManagerImpl.<init>(OmMetadataManagerImpl.java:295)
>       at 
> org.apache.hadoop.ozone.om.OzoneManager.instantiateServices(OzoneManager.java:743)
>       at org.apache.hadoop.ozone.om.OzoneManager.<init>(OzoneManager.java:623)
>       at 
> org.apache.hadoop.ozone.om.OzoneManager.createOm(OzoneManager.java:708)
>       at 
> org.apache.hadoop.ozone.om.OzoneManagerStarter$OMStarterHelper.start(OzoneManagerStarter.java:189)
>       at 
> org.apache.hadoop.ozone.om.OzoneManagerStarter.startOm(OzoneManagerStarter.java:86)
>       at 
> org.apache.hadoop.ozone.om.OzoneManagerStarter.call(OzoneManagerStarter.java:74)
>       at org.apache.hadoop.hdds.cli.GenericCli.call(GenericCli.java:38)
>       at picocli.CommandLine.executeUserObject(CommandLine.java:1953)
>       at picocli.CommandLine.access$1300(CommandLine.java:145)
>       at 
> picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2352)
>       at picocli.CommandLine$RunLast.handle(CommandLine.java:2346)
>       at picocli.CommandLine$RunLast.handle(CommandLine.java:2311)
>       at 
> picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2179)
>       at picocli.CommandLine.execute(CommandLine.java:2078)
>       at org.apache.hadoop.hdds.cli.GenericCli.execute(GenericCli.java:100)
>       at org.apache.hadoop.hdds.cli.GenericCli.run(GenericCli.java:91)
>       at 
> org.apache.hadoop.ozone.om.OzoneManagerStarter.main(OzoneManagerStarter.java:58)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to