[ 
https://issues.apache.org/jira/browse/HDDS-5032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sammi Chen updated HDDS-5032:
-----------------------------
    Description: 
We have met two cases of container loading exceptions, one case is fixed by 
HDDS-4722 which throws out Runtime Exception, another case is I backuped a 
container dirctory using name ContainerID-Backup which triggers bad formated 
container directory name exception. 

The consequence of these two cases are the massive containers lefting on the 
same volume are not loaded. While DN is started and running healthly,  SCM 
treats all these container replicas as missing and starts to schedule many 
replica replication tasks. 

This task is to fix the issue. If there is specific container loading 
exception, LOG it, and go to load next container. 


2021-03-12 20:46:16,420 [Thread-8] ERROR 
org.apache.hadoop.ozone.container.ozoneimpl.ContainerReader: Caught a Run time 
exception during reading container files from Volume /data3/hdds/hdds {}
java.lang.NumberFormatException: For input string: "1823-raw"
        at 
java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
        at java.lang.Long.parseLong(Long.java:589)
        at java.lang.Long.parseLong(Long.java:631)
        at 
org.apache.hadoop.ozone.container.common.helpers.ContainerUtils.getContainerID(ContainerUtils.java:242)
        at 
org.apache.hadoop.ozone.container.common.helpers.ContainerUtils.getContainerFile(ContainerUtils.java:234)
        at 
org.apache.hadoop.ozone.container.ozoneimpl.ContainerReader.readVolume(ContainerReader.java:132)
        at 
org.apache.hadoop.ozone.container.ozoneimpl.ContainerReader.run(ContainerReader.java:91)
        at java.lang.Thread.run(Thread.java:748)

  was:
We have met two cases of container loading exceptions, one case is fixed by 
HDDS-4722 which throws out Runtime Exception, another case is I backuped a 
container dirctory using name ContainerID-Backup which triggers bad formated 
container directory name exception. 

The consequence of these two cases are the massive containers lefting on the 
same volume are not loaded. While DN is started and running healthly,  SCM 
treats all these container replicas as missing and starts to schedule many 
replica replication tasks. 

This task is to fix the issue. If there is specific container loading 
exception, LOG it, and go to load next container. 


> DN stopped to load containers on volume after a container load exception
> ------------------------------------------------------------------------
>
>                 Key: HDDS-5032
>                 URL: https://issues.apache.org/jira/browse/HDDS-5032
>             Project: Apache Ozone
>          Issue Type: Bug
>            Reporter: Sammi Chen
>            Assignee: Sammi Chen
>            Priority: Critical
>
> We have met two cases of container loading exceptions, one case is fixed by 
> HDDS-4722 which throws out Runtime Exception, another case is I backuped a 
> container dirctory using name ContainerID-Backup which triggers bad formated 
> container directory name exception. 
> The consequence of these two cases are the massive containers lefting on the 
> same volume are not loaded. While DN is started and running healthly,  SCM 
> treats all these container replicas as missing and starts to schedule many 
> replica replication tasks. 
> This task is to fix the issue. If there is specific container loading 
> exception, LOG it, and go to load next container. 
> 2021-03-12 20:46:16,420 [Thread-8] ERROR 
> org.apache.hadoop.ozone.container.ozoneimpl.ContainerReader: Caught a Run 
> time exception during reading container files from Volume /data3/hdds/hdds {}
> java.lang.NumberFormatException: For input string: "1823-raw"
>         at 
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
>         at java.lang.Long.parseLong(Long.java:589)
>         at java.lang.Long.parseLong(Long.java:631)
>         at 
> org.apache.hadoop.ozone.container.common.helpers.ContainerUtils.getContainerID(ContainerUtils.java:242)
>         at 
> org.apache.hadoop.ozone.container.common.helpers.ContainerUtils.getContainerFile(ContainerUtils.java:234)
>         at 
> org.apache.hadoop.ozone.container.ozoneimpl.ContainerReader.readVolume(ContainerReader.java:132)
>         at 
> org.apache.hadoop.ozone.container.ozoneimpl.ContainerReader.run(ContainerReader.java:91)
>         at java.lang.Thread.run(Thread.java:748)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to