[
https://issues.apache.org/jira/browse/HDDS-5032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17310485#comment-17310485
]
Wei-Chiu Chuang commented on HDDS-5032:
---------------------------------------
Yes... I forgot to mention that the outcome of HDFS-4722 is a lot of missing
containers which triggers a lot of container re-replication.
> DN stopped to load containers on volume after a container load exception
> ------------------------------------------------------------------------
>
> Key: HDDS-5032
> URL: https://issues.apache.org/jira/browse/HDDS-5032
> Project: Apache Ozone
> Issue Type: Bug
> Reporter: Sammi Chen
> Assignee: Sammi Chen
> Priority: Critical
>
> We have met two cases of container loading exceptions, one case is fixed by
> HDDS-4722 which throws out Runtime Exception, another case is I backuped a
> container dirctory using name ContainerID-Backup which triggers bad formated
> container directory name exception.
> The consequence of these two cases are the massive containers lefting on the
> same volume are not loaded. While DN is started and running healthly, SCM
> treats all these container replicas as missing and starts to schedule many
> replica replication tasks.
> This task is to fix the issue. If there is specific container loading
> exception, LOG it, and go to load next container.
> Case 1:
> 2021-03-12 20:46:16,420 [Thread-8] ERROR
> org.apache.hadoop.ozone.container.ozoneimpl.ContainerReader: Caught a Run
> time exception during reading container files from Volume /data3/hdds/hdds {}
> java.lang.NumberFormatException: For input string: "1823-raw"
> at
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
> at java.lang.Long.parseLong(Long.java:589)
> at java.lang.Long.parseLong(Long.java:631)
> at
> org.apache.hadoop.ozone.container.common.helpers.ContainerUtils.getContainerID(ContainerUtils.java:242)
> at
> org.apache.hadoop.ozone.container.common.helpers.ContainerUtils.getContainerFile(ContainerUtils.java:234)
> at
> org.apache.hadoop.ozone.container.ozoneimpl.ContainerReader.readVolume(ContainerReader.java:132)
> at
> org.apache.hadoop.ozone.container.ozoneimpl.ContainerReader.run(ContainerReader.java:91)
> at java.lang.Thread.run(Thread.java:748)
> Case2:
> 2021-03-25 10:15:47,502 [Thread-15] ERROR
> org.apache.hadoop.ozone.container.ozoneimpl.ContainerReader: Caught a Run
> time exception during reading container files from Volume /data5/hdds/hdds {}
> org.apache.hadoop.metrics2.MetricsException: Metrics source RDBMetrics
> already exists!
> at
> org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:152)
> at
> org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:125)
> at
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:229)
> at
> org.apache.hadoop.hdds.utils.db.RDBMetrics.create(RDBMetrics.java:47)
> at org.apache.hadoop.hdds.utils.db.RDBStore.<init>(RDBStore.java:152)
> at
> org.apache.hadoop.hdds.utils.db.DBStoreBuilder.build(DBStoreBuilder.java:191)
> at
> org.apache.hadoop.ozone.container.metadata.AbstractDatanodeStore.start(AbstractDatanodeStore.java:128)
> at
> org.apache.hadoop.ozone.container.metadata.AbstractDatanodeStore.<init>(AbstractDatanodeStore.java:103)
> at
> org.apache.hadoop.ozone.container.metadata.DatanodeStoreSchemaOneImpl.<init>(DatanodeStoreSchemaOneImpl.java:40)
> at
> org.apache.hadoop.ozone.container.keyvalue.helpers.BlockUtils.getUncachedDatanodeStore(BlockUtils.java:68)
> at
> org.apache.hadoop.ozone.container.keyvalue.helpers.BlockUtils.getUncachedDatanodeStore(BlockUtils.java:93)
> at
> org.apache.hadoop.ozone.container.keyvalue.helpers.KeyValueContainerUtil.parseKVContainerData(KeyValueContainerUtil.java:195)
> at
> org.apache.hadoop.ozone.container.ozoneimpl.ContainerReader.verifyAndFixupContainerData(ContainerReader.java:181)
> at
> org.apache.hadoop.ozone.container.ozoneimpl.ContainerReader.verifyContainerFile(ContainerReader.java:158)
> at
> org.apache.hadoop.ozone.container.ozoneimpl.ContainerReader.readVolume(ContainerReader.java:136)
> at
> org.apache.hadoop.ozone.container.ozoneimpl.ContainerReader.run(ContainerReader.java:91)
> at java.lang.Thread.run(Thread.java:748)
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]