[ 
https://issues.apache.org/jira/browse/HDDS-13928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Andika updated HDDS-13928:
-------------------------------
    Description: 
Thought of one case that might cause orphan blocks
1. A CLOSED Container 1 contains replicas in [DN1, DN2, DN3]
2. A delete transaction is created for Container 1 blocks, but not yet sent
3. DN1 is marked as DEAD and SCM removes the the container replica
4. SCM replicates the Container 1 to DN4 
5. Delete commands are sent to [DN2, DN3, DN4]
6. DN2, DN3, DN4 finished the deletion and acknowleged to SCM
7. SCM removes the delete transaction
8. DN1 comes back alive (resurrected)
9. The overreplicated replica DN4 is removed, which results back to the 
original 3 replicas in step 1.

Notice that since the deletion transaction has been removed, the undeleted 
blocks in DN1 will be orphaned and will never be deleted. 

It seems that we have a way to handle this by setting 
hdds.scm.unknown-container.action to DELETE since the default is WARN.

However, currently it seems that it's not enough since 
1. Unless all the data blocks in the containers are deleted, the container will 
not be deleted, so if only some blocks are deleted, it won't trigger any orphan 
block deletion.
2. Seems SCM only handles FCR (ContainerReportHandler) and not ICR 
(IncrementalContainerReportHandler)

  was:
Thought of one case that might cause orphan blocks
1. A CLOSED Container 1 contains replicas in [DN1, DN2, DN3]
2. A delete transaction is created for Container 1 blocks, but not yet sent
3. DN1 is marked as DEAD and SCM removes the the container replica
4. SCM replicates the Container 1 to DN4 
5. Delete commands are sent to [DN2, DN3, DN4]
6. DN2, DN3, DN4 finished the deletion and acknowleged to SCM
7. SCM removes the delete transaction
8. DN1 comes back alive (resurrected)
9. The overreplicated replica DN4 is removed, which results back to the 
original 3 replicas in step 1.

Notice that since the deletion transaction has been removed, the undeleted 
blocks in DN1 will be orphaned and will never be deleted. 

We need a way to handle this case.


> Cleanup orphan blocks on resurrected datanode
> ---------------------------------------------
>
>                 Key: HDDS-13928
>                 URL: https://issues.apache.org/jira/browse/HDDS-13928
>             Project: Apache Ozone
>          Issue Type: Sub-task
>            Reporter: Ivan Andika
>            Priority: Major
>
> Thought of one case that might cause orphan blocks
> 1. A CLOSED Container 1 contains replicas in [DN1, DN2, DN3]
> 2. A delete transaction is created for Container 1 blocks, but not yet sent
> 3. DN1 is marked as DEAD and SCM removes the the container replica
> 4. SCM replicates the Container 1 to DN4 
> 5. Delete commands are sent to [DN2, DN3, DN4]
> 6. DN2, DN3, DN4 finished the deletion and acknowleged to SCM
> 7. SCM removes the delete transaction
> 8. DN1 comes back alive (resurrected)
> 9. The overreplicated replica DN4 is removed, which results back to the 
> original 3 replicas in step 1.
> Notice that since the deletion transaction has been removed, the undeleted 
> blocks in DN1 will be orphaned and will never be deleted. 
> It seems that we have a way to handle this by setting 
> hdds.scm.unknown-container.action to DELETE since the default is WARN.
> However, currently it seems that it's not enough since 
> 1. Unless all the data blocks in the containers are deleted, the container 
> will not be deleted, so if only some blocks are deleted, it won't trigger any 
> orphan block deletion.
> 2. Seems SCM only handles FCR (ContainerReportHandler) and not ICR 
> (IncrementalContainerReportHandler)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to