[ 
https://issues.apache.org/jira/browse/HDDS-8179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen O'Donnell updated HDDS-8179:
------------------------------------
    Summary: Improve Decommission log message for unhealthy containers  (was: 
Datanode decommissioning blocked due to unhealthy container)

> Improve Decommission log message for unhealthy containers
> ---------------------------------------------------------
>
>                 Key: HDDS-8179
>                 URL: https://issues.apache.org/jira/browse/HDDS-8179
>             Project: Apache Ozone
>          Issue Type: Bug
>          Components: ECOfflineRecovery, SCM
>            Reporter: Varsha Ravi
>            Assignee: Stephen O'Donnell
>            Priority: Major
>
> The Replication Manager is sending delete container command to a non-empty 
> container due to HDDS-7775. The container is not deleted but the *subsequent 
> decommissioning calls to any of the DNs is not completing* because the 
> container is in under-replicated as well as unhealthy state.
> *SCM.log:*
> {noformat}
> 2023-03-14 21:53:26,413 INFO 
> org.apache.hadoop.hdds.scm.container.replication.ReplicationManager: Sending 
> command [deleteContainerCommand: containerID: 15019, replicaIndex: 1, force: 
> false] for container ContainerInfo{id=#15019, state=DELETING, 
> pipelineID=PipelineID=e3fb8629-89ee-472a-9c43-3962629bd7a9, 
> stateEnterTime=2023-03-14T19:17:07.315Z, owner=om2} to 
> 1ca038f8-c505-47ca-b701-d542b85bb75b
> 2023-03-14 21:53:26,413 INFO 
> org.apache.hadoop.hdds.scm.container.replication.ReplicationManager: Sending 
> command [deleteContainerCommand: containerID: 15019, replicaIndex: 5, force: 
> false] for container ContainerInfo{id=#15019, state=DELETING, 
> pipelineID=PipelineID=e3fb8629-89ee-472a-9c43-3962629bd7a9, 
> stateEnterTime=2023-03-14T19:17:07.315Z, owner=om2} to 
> 1ac8e090-7eb7-4dab-93b7-97e4845f7b49
> 2023-03-14 23:19:12,206 INFO 
> org.apache.hadoop.hdds.scm.container.replication.ReplicationManager: Sending 
> command [deleteContainerCommand: containerID: 15019, replicaIndex: 3, force: 
> false] for container ContainerInfo{id=#15019, state=DELETING, 
> pipelineID=PipelineID=e3fb8629-89ee-472a-9c43-3962629bd7a9, 
> stateEnterTime=2023-03-14T19:17:07.315Z, owner=om2} to 
> c5c3948e-1296-4313-8c4e-9e6e50424280
> 2023-03-14 23:19:53,296 INFO 
> org.apache.hadoop.hdds.scm.node.NodeDecommissionManager: Starting 
> Decommission for node c5c3948e-1296-4313-8c4e-9e6e50424280
> 2023-03-14 23:22:38,512 INFO 
> org.apache.hadoop.hdds.scm.node.DatanodeAdminMonitorImpl: Under Replicated 
> Container #15019 
> org.apache.hadoop.hdds.scm.container.replication.ECContainerReplicaCount@2bd10f2f;
>  Replicas{
> ContainerReplica{containerID=#15019, state=CLOSED, 
> datanodeDetails=ba62c66a-a342-4147-8344-3ce91726c2dc, 
> placeOfBirth=ba62c66a-a342-4147-8344-3ce91726c2dc, sequenceId=0, keyCount=1, 
> bytesUsed=102400,replicaIndex=5},
> ContainerReplica{containerID=#15019, state=CLOSED, 
> datanodeDetails=15af7526-8376-45c4-97a5-7a74b7abc678, 
> placeOfBirth=15af7526-8376-45c4-97a5-7a74b7abc678, sequenceId=0, keyCount=1, 
> bytesUsed=102400,replicaIndex=4},
> ContainerReplica{containerID=#15019, state=CLOSED, 
> datanodeDetails=1ca038f8-c505-47ca-b701-d542b85bb75b, 
> placeOfBirth=1ca038f8-c505-47ca-b701-d542b85bb75b, sequenceId=0, keyCount=1, 
> bytesUsed=102400,replicaIndex=1},
> ContainerReplica{containerID=#15019, state=CLOSED, 
> datanodeDetails=c5c3948e-1296-4313-8c4e-9e6e50424280, 
> placeOfBirth=c5c3948e-1296-4313-8c4e-9e6e50424280, sequenceId=0, keyCount=1, 
> bytesUsed=102400,replicaIndex=3},
> ContainerReplica{containerID=#15019, state=CLOSED, 
> datanodeDetails=f689fc55-e0e3-4785-9f2a-f799e18f0578, 
> placeOfBirth=f689fc55-e0e3-4785-9f2a-f799e18f0578, sequenceId=0, keyCount=1, 
> bytesUsed=102400,replicaIndex=1},
> ContainerReplica{containerID=#15019, state=CLOSED, 
> datanodeDetails=1ac8e090-7eb7-4dab-93b7-97e4845f7b49, 
> placeOfBirth=1ac8e090-7eb7-4dab-93b7-97e4845f7b49, sequenceId=0, keyCount=1, 
> bytesUsed=102400,replicaIndex=5}}
> 2023-03-14 23:22:38,512 INFO 
> org.apache.hadoop.hdds.scm.node.DatanodeAdminMonitorImpl: Unhealthy Container 
> #15019 
> org.apache.hadoop.hdds.scm.container.replication.ECContainerReplicaCount@2bd10f2f;
>  Replicas{
> ContainerReplica{containerID=#15019, state=CLOSED, 
> datanodeDetails=ba62c66a-a342-4147-8344-3ce91726c2dc, 
> placeOfBirth=ba62c66a-a342-4147-8344-3ce91726c2dc, sequenceId=0, keyCount=1, 
> bytesUsed=102400,replicaIndex=5},
> ContainerReplica{containerID=#15019, state=CLOSED, 
> datanodeDetails=15af7526-8376-45c4-97a5-7a74b7abc678, 
> placeOfBirth=15af7526-8376-45c4-97a5-7a74b7abc678, sequenceId=0, keyCount=1, 
> bytesUsed=102400,replicaIndex=4},
> ContainerReplica{containerID=#15019, state=CLOSED, 
> datanodeDetails=1ca038f8-c505-47ca-b701-d542b85bb75b, 
> placeOfBirth=1ca038f8-c505-47ca-b701-d542b85bb75b, sequenceId=0, keyCount=1, 
> bytesUsed=102400,replicaIndex=1},
> ContainerReplica{containerID=#15019, state=CLOSED, 
> datanodeDetails=c5c3948e-1296-4313-8c4e-9e6e50424280, 
> placeOfBirth=c5c3948e-1296-4313-8c4e-9e6e50424280, sequenceId=0, keyCount=1, 
> bytesUsed=102400,replicaIndex=3},
> ContainerReplica{containerID=#15019, state=CLOSED, 
> datanodeDetails=f689fc55-e0e3-4785-9f2a-f799e18f0578, 
> placeOfBirth=f689fc55-e0e3-4785-9f2a-f799e18f0578, sequenceId=0, keyCount=1, 
> bytesUsed=102400,replicaIndex=1},
> ContainerReplica{containerID=#15019, state=CLOSED, 
> datanodeDetails=1ac8e090-7eb7-4dab-93b7-97e4845f7b49, 
> placeOfBirth=1ac8e090-7eb7-4dab-93b7-97e4845f7b49, sequenceId=0, keyCount=1, 
> bytesUsed=102400,replicaIndex=5}}
> 2023-03-14 23:22:38,512 INFO 
> org.apache.hadoop.hdds.scm.node.DatanodeAdminMonitorImpl: 
> c5c3948e-1296-4313-8c4e-9e6e50424280 has 60 sufficientlyReplicated, 1 
> underReplicated and 1 unhealthy containers{noformat}
> *DN.log:*
> {noformat}
> 2023-03-14 21:53:32,032 ERROR 
> org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler: Received 
> container deletion command for container 15019 but the container is not empty 
> with blockCount 1
> 2023-03-14 21:53:32,035 ERROR 
> org.apache.hadoop.ozone.container.common.statemachine.commandhandler.DeleteContainerCommandHandler:
>  Exception occurred while deleting the container.
> org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException:
>  Non-force deletion of non-empty container is not allowed.
>     at 
> org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.deleteInternal(KeyValueHandler.java:1303)
>     at 
> org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.deleteContainer(KeyValueHandler.java:1160)
>     at 
> org.apache.hadoop.ozone.container.ozoneimpl.ContainerController.deleteContainer(ContainerController.java:182)
>     at 
> org.apache.hadoop.ozone.container.common.statemachine.commandhandler.DeleteContainerCommandHandler.handleInternal(DeleteContainerCommandHandler.java:108)
>     at 
> org.apache.hadoop.ozone.container.common.statemachine.commandhandler.DeleteContainerCommandHandler.lambda$handle$0(DeleteContainerCommandHandler.java:78)
>     at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>     at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>     at java.base/java.lang.Thread.run(Thread.java:834){noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to