[ 
https://issues.apache.org/jira/browse/HDDS-6235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Rose updated HDDS-6235:
-----------------------------
    Description: 
An empty KeyValueContainer will have an empty chunks directory. 
TarContainerPacker#pack recurses into directories adding files into containers, 
but if the chunks directory is empty, it will not be included in the tar. The 
receiver will unpack the tar successfully, but the resulting container will not 
have a chunks directory. After this, the container will not be able to 
replicated further, as the tar packing step requires all container pieces to be 
present on disk. This issue is more likely to occur due to HDDS-5359, which 
causes many empty containers to be tracked by SCM indefinitely.

Since the issue only affects empty containers, there does not appear to be any 
data loss risk, even though the container scanner may detect it as 
"corruption". The issue may manifest as a container continuously attempting to 
be replicated and failing, or the container being marked unhealthy by the 
background container scanner (if it is enabled).

This Jira will fix the issue with the tar packer, and also add a repair step on 
datanode startup to create the chunks directory for containers that do not have 
one. This step should be a quick addition to datanode startup that already 
iterates all the containers, and should not impact startup time.

  was:
An empty KeyValueContainer will have an empty chunks directory. 
TarContainerPacker#pack recurses into directories adding files into containers, 
but if the chunks directory is empty, it will not be included in the tar. The 
receiver will unpack the tar successfully, but the resulting container will not 
have a chunks directory. After this, the container will not be able to 
replicated further, as the tar packing step requires all container pieces to be 
present on disk. The container may also be marked unhealthy by the background 
container scanner. This issue is more likely to occur due to HDDS-5359, which 
causes many empty containers to be tracked by SCM indefinitely.

This Jira will fix the issue with the tar packer, and also add a repair step on 
datanode startup to create the chunks directory for containers that do not have 
one. This step should be a quick addition to datanode startup that already 
iterates all the containers, and should not impact startup time.


> Empty KeyValueContainers cannot be replicated
> ---------------------------------------------
>
>                 Key: HDDS-6235
>                 URL: https://issues.apache.org/jira/browse/HDDS-6235
>             Project: Apache Ozone
>          Issue Type: Bug
>          Components: Ozone Datanode
>    Affects Versions: 1.0.0, 1.1.0, 1.2.0
>            Reporter: Ethan Rose
>            Assignee: Ethan Rose
>            Priority: Major
>
> An empty KeyValueContainer will have an empty chunks directory. 
> TarContainerPacker#pack recurses into directories adding files into 
> containers, but if the chunks directory is empty, it will not be included in 
> the tar. The receiver will unpack the tar successfully, but the resulting 
> container will not have a chunks directory. After this, the container will 
> not be able to replicated further, as the tar packing step requires all 
> container pieces to be present on disk. This issue is more likely to occur 
> due to HDDS-5359, which causes many empty containers to be tracked by SCM 
> indefinitely.
> Since the issue only affects empty containers, there does not appear to be 
> any data loss risk, even though the container scanner may detect it as 
> "corruption". The issue may manifest as a container continuously attempting 
> to be replicated and failing, or the container being marked unhealthy by the 
> background container scanner (if it is enabled).
> This Jira will fix the issue with the tar packer, and also add a repair step 
> on datanode startup to create the chunks directory for containers that do not 
> have one. This step should be a quick addition to datanode startup that 
> already iterates all the containers, and should not impact startup time.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to