[jira] [Updated] (HDDS-6449) Failed container delete can leave artifacts on disk

Neil Joshi (Jira) Mon, 20 Jun 2022 18:52:07 -0700


     [ 
https://issues.apache.org/jira/browse/HDDS-6449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Neil Joshi updated HDDS-6449:
-----------------------------
    Description: 
When SCM issues a delete command to a datanode, the datanode does the following 
steps:
writeLock()
    1. The container is removed from the in memory container set.
writeUnlock()
    2. The container metadata directory recursively deleted.
    3. The container chunks directory recursively deleted.
    4. The datanode sets the container's in memory state to DELETED
        - This is purely for the ICR as the container is not present in the 
container set anymore.
    5. Datanode sends incremental container report to SCM with the new state.
        - The container has been removed from the in-memory set at this point, 
so once the ICR is sent the container is unreachable.

In HDDS-6441, A failure in step 2 removed the .container file and 
db.checkpoints directory (unused) from the metadata directory, and the rest of 
the steps were not done after the IO exception was thrown during the delete. 
This caused an error to be logged when the partial state was read on datanode 
restart.

This current method of deleting containers provides no way to recover from or 
retry a failed delete, because the container is removed from the in-memory set 
as the first step. This Jira aims to change the datanode delete steps so that 
if a delete fails, the existing SCM container delete retry logic or the 
datanode itself can eventually get the lingering state off the disk.

 

Proposed solution v1,

Provided link to sharable google doc for potential solution "to resolve the 
datanode artifact issue by using a background failedContainerDelete thread that 
is run on each datanode to cleanup failed container delete transactions.":

[https://docs.google.com/document/d/1ngRCbA_HxoNOof1kaiDuw0XYjJ2Z7t64ATF-V0TsJ-4/edit?usp=sharing]

 

Proposed solution v2,

Following discussions with Ethan, Ritesh, Sid and Nanda, have created an 
updated proposed solution through an atomic rename of containers on container 
delete.  The rename is to a common cleanup path on each disk.  Subsequently, 
the Scrubber service is modified to delete all files found in the cleanup path. 
 Design doc (draft) for this is in the shared google doc:

https://docs.google.com/document/d/1Xt_x1Uhs4e1vJ6cJgokdlMxI0tRSxNBEkZlI9MXMzMg/edit?usp=sharing

  was:
When SCM issues a delete command to a datanode, the datanode does the following 
steps:
writeLock()
    1. The container is removed from the in memory container set.
writeUnlock()
    2. The container metadata directory recursively deleted.
    3. The container chunks directory recursively deleted.
    4. The datanode sets the container's in memory state to DELETED
        - This is purely for the ICR as the container is not present in the 
container set anymore.
    5. Datanode sends incremental container report to SCM with the new state.
        - The container has been removed from the in-memory set at this point, 
so once the ICR is sent the container is unreachable.

In HDDS-6441, A failure in step 2 removed the .container file and 
db.checkpoints directory (unused) from the metadata directory, and the rest of 
the steps were not done after the IO exception was thrown during the delete. 
This caused an error to be logged when the partial state was read on datanode 
restart.

This current method of deleting containers provides no way to recover from or 
retry a failed delete, because the container is removed from the in-memory set 
as the first step. This Jira aims to change the datanode delete steps so that 
if a delete fails, the existing SCM container delete retry logic or the 
datanode itself can eventually get the lingering state off the disk.

 

Provided link to sharable google doc for potential solution "to resolve the 
datanode artifact issue by using a background failedContainerDelete thread that 
is run on each datanode to cleanup failed container delete transactions.":

https://docs.google.com/document/d/1ngRCbA_HxoNOof1kaiDuw0XYjJ2Z7t64ATF-V0TsJ-4/edit?usp=sharing


> Failed container delete can leave artifacts on disk
> ---------------------------------------------------
>
>                 Key: HDDS-6449
>                 URL: https://issues.apache.org/jira/browse/HDDS-6449
>             Project: Apache Ozone
>          Issue Type: Bug
>          Components: Ozone Datanode
>    Affects Versions: 1.0.0, 1.1.0, 1.2.0
>            Reporter: Ethan Rose
>            Assignee: Neil Joshi
>            Priority: Major
>              Labels: HDDS-6449
>
> When SCM issues a delete command to a datanode, the datanode does the 
> following steps:
> writeLock()
>     1. The container is removed from the in memory container set.
> writeUnlock()
>     2. The container metadata directory recursively deleted.
>     3. The container chunks directory recursively deleted.
>     4. The datanode sets the container's in memory state to DELETED
>         - This is purely for the ICR as the container is not present in the 
> container set anymore.
>     5. Datanode sends incremental container report to SCM with the new state.
>         - The container has been removed from the in-memory set at this 
> point, so once the ICR is sent the container is unreachable.
> In HDDS-6441, A failure in step 2 removed the .container file and 
> db.checkpoints directory (unused) from the metadata directory, and the rest 
> of the steps were not done after the IO exception was thrown during the 
> delete. This caused an error to be logged when the partial state was read on 
> datanode restart.
> This current method of deleting containers provides no way to recover from or 
> retry a failed delete, because the container is removed from the in-memory 
> set as the first step. This Jira aims to change the datanode delete steps so 
> that if a delete fails, the existing SCM container delete retry logic or the 
> datanode itself can eventually get the lingering state off the disk.
>  
> Proposed solution v1,
> Provided link to sharable google doc for potential solution "to resolve the 
> datanode artifact issue by using a background failedContainerDelete thread 
> that is run on each datanode to cleanup failed container delete 
> transactions.":
> [https://docs.google.com/document/d/1ngRCbA_HxoNOof1kaiDuw0XYjJ2Z7t64ATF-V0TsJ-4/edit?usp=sharing]
>  
> Proposed solution v2,
> Following discussions with Ethan, Ritesh, Sid and Nanda, have created an 
> updated proposed solution through an atomic rename of containers on container 
> delete.  The rename is to a common cleanup path on each disk.  Subsequently, 
> the Scrubber service is modified to delete all files found in the cleanup 
> path.  Design doc (draft) for this is in the shared google doc:
> https://docs.google.com/document/d/1Xt_x1Uhs4e1vJ6cJgokdlMxI0tRSxNBEkZlI9MXMzMg/edit?usp=sharing



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (HDDS-6449) Failed container delete can leave artifacts on disk

Reply via email to