[jira] [Commented] (HDDS-8179) Datanode decommissioning blocked due to non-empty replica of deleting container

Stephen O'Donnell (Jira) Fri, 05 May 2023 03:53:04 -0700


    [ 
https://issues.apache.org/jira/browse/HDDS-8179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17719799#comment-17719799
 ]


Stephen O'Donnell commented on HDDS-8179:
-----------------------------------------

I mistakenly assumed the SCM thought the container was empty, but the DN did 
not, which is not the case here. It is clear from the logs, that the replicas 
on SCM see a keyCount=1 and the DN refusing to do the delete is valid. The 
question is then how the container got into a DELETING state with a keyCount != 
0. This is what I believe happened, or at least something along these lines:

# Container opened for a write
# Client starts write of a small key, smaller than a stripe.
# Around the same time as the client starts the write, the node is 
decommissioned. The first step here is to close the pipelines.
# Pipeline is closed that moves the containers into a CLOSING state and sends 
commands to close the container to the DNs. DNs don't get this until the next 
heartbeat.
# One DN gets the close command, and closes the container and sends an ICR. If 
this replica was not data index = 1 or one of the parity nodes, it could report 
a container is closed with zero keys. This ICR will transition the container to 
CLOSED in SCM.
# RM runs and sees the container is CLOSED with one replica and its zero keys, 
and transitions the container into DELETING.
# Other replicas check in with a keysize of 1 which updates the container key 
size in RM leaving it in this strange state we don't expect, but at this stage 
RM is has nothing to transition the container back to CLOSED.

One part of the problem, is that we let the container go from CLOSING to CLOSED 
too easily. For RATIS, there are some checks so that only a replica with the 
latest BCSID can transition the container to CLOSED. For EC, I think we need to 
add some logic to ensure that only a "key" replica can transition the close. A 
key replica is replicaIndex = 1 or any parity index, as they are guaranteed to 
hold a block file for every block in the container. Other replicas may be 
empty. We already have logic to ensure the keyCount and ByteUsed are only 
updated by a "Key Replica" so this woudl extend the same thing to the CLOSING 
to CLOSED transition in AbstractContainerReportHandler. It should stop this 
incorrect DELETING state from being able to happen.

> Datanode decommissioning blocked due to non-empty replica of deleting 
> container
> -------------------------------------------------------------------------------
>
>                 Key: HDDS-8179
>                 URL: https://issues.apache.org/jira/browse/HDDS-8179
>             Project: Apache Ozone
>          Issue Type: Bug
>          Components: ECOfflineRecovery, SCM
>            Reporter: Varsha Ravi
>            Assignee: Stephen O'Donnell
>            Priority: Major
>              Labels: pull-request-available
>
> The Replication Manager is sending delete container command to a non-empty 
> container due to HDDS-7775. The container is not deleted but the *subsequent 
> decommissioning calls to any of the DNs is not completing* because the 
> container is in under-replicated as well as unhealthy state.
> *SCM.log:*
> {noformat}
> 2023-03-14 21:53:26,413 INFO 
> org.apache.hadoop.hdds.scm.container.replication.ReplicationManager: Sending 
> command [deleteContainerCommand: containerID: 15019, replicaIndex: 1, force: 
> false] for container ContainerInfo{id=#15019, state=DELETING, 
> pipelineID=PipelineID=e3fb8629-89ee-472a-9c43-3962629bd7a9, 
> stateEnterTime=2023-03-14T19:17:07.315Z, owner=om2} to 
> 1ca038f8-c505-47ca-b701-d542b85bb75b
> 2023-03-14 21:53:26,413 INFO 
> org.apache.hadoop.hdds.scm.container.replication.ReplicationManager: Sending 
> command [deleteContainerCommand: containerID: 15019, replicaIndex: 5, force: 
> false] for container ContainerInfo{id=#15019, state=DELETING, 
> pipelineID=PipelineID=e3fb8629-89ee-472a-9c43-3962629bd7a9, 
> stateEnterTime=2023-03-14T19:17:07.315Z, owner=om2} to 
> 1ac8e090-7eb7-4dab-93b7-97e4845f7b49
> 2023-03-14 23:19:12,206 INFO 
> org.apache.hadoop.hdds.scm.container.replication.ReplicationManager: Sending 
> command [deleteContainerCommand: containerID: 15019, replicaIndex: 3, force: 
> false] for container ContainerInfo{id=#15019, state=DELETING, 
> pipelineID=PipelineID=e3fb8629-89ee-472a-9c43-3962629bd7a9, 
> stateEnterTime=2023-03-14T19:17:07.315Z, owner=om2} to 
> c5c3948e-1296-4313-8c4e-9e6e50424280
> 2023-03-14 23:19:53,296 INFO 
> org.apache.hadoop.hdds.scm.node.NodeDecommissionManager: Starting 
> Decommission for node c5c3948e-1296-4313-8c4e-9e6e50424280
> 2023-03-14 23:22:38,512 INFO 
> org.apache.hadoop.hdds.scm.node.DatanodeAdminMonitorImpl: Under Replicated 
> Container #15019 
> org.apache.hadoop.hdds.scm.container.replication.ECContainerReplicaCount@2bd10f2f;
>  Replicas{
> ContainerReplica{containerID=#15019, state=CLOSED, 
> datanodeDetails=ba62c66a-a342-4147-8344-3ce91726c2dc, 
> placeOfBirth=ba62c66a-a342-4147-8344-3ce91726c2dc, sequenceId=0, keyCount=1, 
> bytesUsed=102400,replicaIndex=5},
> ContainerReplica{containerID=#15019, state=CLOSED, 
> datanodeDetails=15af7526-8376-45c4-97a5-7a74b7abc678, 
> placeOfBirth=15af7526-8376-45c4-97a5-7a74b7abc678, sequenceId=0, keyCount=1, 
> bytesUsed=102400,replicaIndex=4},
> ContainerReplica{containerID=#15019, state=CLOSED, 
> datanodeDetails=1ca038f8-c505-47ca-b701-d542b85bb75b, 
> placeOfBirth=1ca038f8-c505-47ca-b701-d542b85bb75b, sequenceId=0, keyCount=1, 
> bytesUsed=102400,replicaIndex=1},
> ContainerReplica{containerID=#15019, state=CLOSED, 
> datanodeDetails=c5c3948e-1296-4313-8c4e-9e6e50424280, 
> placeOfBirth=c5c3948e-1296-4313-8c4e-9e6e50424280, sequenceId=0, keyCount=1, 
> bytesUsed=102400,replicaIndex=3},
> ContainerReplica{containerID=#15019, state=CLOSED, 
> datanodeDetails=f689fc55-e0e3-4785-9f2a-f799e18f0578, 
> placeOfBirth=f689fc55-e0e3-4785-9f2a-f799e18f0578, sequenceId=0, keyCount=1, 
> bytesUsed=102400,replicaIndex=1},
> ContainerReplica{containerID=#15019, state=CLOSED, 
> datanodeDetails=1ac8e090-7eb7-4dab-93b7-97e4845f7b49, 
> placeOfBirth=1ac8e090-7eb7-4dab-93b7-97e4845f7b49, sequenceId=0, keyCount=1, 
> bytesUsed=102400,replicaIndex=5}}
> 2023-03-14 23:22:38,512 INFO 
> org.apache.hadoop.hdds.scm.node.DatanodeAdminMonitorImpl: Unhealthy Container 
> #15019 
> org.apache.hadoop.hdds.scm.container.replication.ECContainerReplicaCount@2bd10f2f;
>  Replicas{
> ContainerReplica{containerID=#15019, state=CLOSED, 
> datanodeDetails=ba62c66a-a342-4147-8344-3ce91726c2dc, 
> placeOfBirth=ba62c66a-a342-4147-8344-3ce91726c2dc, sequenceId=0, keyCount=1, 
> bytesUsed=102400,replicaIndex=5},
> ContainerReplica{containerID=#15019, state=CLOSED, 
> datanodeDetails=15af7526-8376-45c4-97a5-7a74b7abc678, 
> placeOfBirth=15af7526-8376-45c4-97a5-7a74b7abc678, sequenceId=0, keyCount=1, 
> bytesUsed=102400,replicaIndex=4},
> ContainerReplica{containerID=#15019, state=CLOSED, 
> datanodeDetails=1ca038f8-c505-47ca-b701-d542b85bb75b, 
> placeOfBirth=1ca038f8-c505-47ca-b701-d542b85bb75b, sequenceId=0, keyCount=1, 
> bytesUsed=102400,replicaIndex=1},
> ContainerReplica{containerID=#15019, state=CLOSED, 
> datanodeDetails=c5c3948e-1296-4313-8c4e-9e6e50424280, 
> placeOfBirth=c5c3948e-1296-4313-8c4e-9e6e50424280, sequenceId=0, keyCount=1, 
> bytesUsed=102400,replicaIndex=3},
> ContainerReplica{containerID=#15019, state=CLOSED, 
> datanodeDetails=f689fc55-e0e3-4785-9f2a-f799e18f0578, 
> placeOfBirth=f689fc55-e0e3-4785-9f2a-f799e18f0578, sequenceId=0, keyCount=1, 
> bytesUsed=102400,replicaIndex=1},
> ContainerReplica{containerID=#15019, state=CLOSED, 
> datanodeDetails=1ac8e090-7eb7-4dab-93b7-97e4845f7b49, 
> placeOfBirth=1ac8e090-7eb7-4dab-93b7-97e4845f7b49, sequenceId=0, keyCount=1, 
> bytesUsed=102400,replicaIndex=5}}
> 2023-03-14 23:22:38,512 INFO 
> org.apache.hadoop.hdds.scm.node.DatanodeAdminMonitorImpl: 
> c5c3948e-1296-4313-8c4e-9e6e50424280 has 60 sufficientlyReplicated, 1 
> underReplicated and 1 unhealthy containers{noformat}
> *DN.log:*
> {noformat}
> 2023-03-14 21:53:32,032 ERROR 
> org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler: Received 
> container deletion command for container 15019 but the container is not empty 
> with blockCount 1
> 2023-03-14 21:53:32,035 ERROR 
> org.apache.hadoop.ozone.container.common.statemachine.commandhandler.DeleteContainerCommandHandler:
>  Exception occurred while deleting the container.
> org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException:
>  Non-force deletion of non-empty container is not allowed.
>     at 
> org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.deleteInternal(KeyValueHandler.java:1303)
>     at 
> org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.deleteContainer(KeyValueHandler.java:1160)
>     at 
> org.apache.hadoop.ozone.container.ozoneimpl.ContainerController.deleteContainer(ContainerController.java:182)
>     at 
> org.apache.hadoop.ozone.container.common.statemachine.commandhandler.DeleteContainerCommandHandler.handleInternal(DeleteContainerCommandHandler.java:108)
>     at 
> org.apache.hadoop.ozone.container.common.statemachine.commandhandler.DeleteContainerCommandHandler.lambda$handle$0(DeleteContainerCommandHandler.java:78)
>     at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>     at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>     at java.base/java.lang.Thread.run(Thread.java:834){noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HDDS-8179) Datanode decommissioning blocked due to non-empty replica of deleting container

Reply via email to