[
https://issues.apache.org/jira/browse/HDDS-8179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Stephen O'Donnell updated HDDS-8179:
------------------------------------
Summary: Improve Decommission log message for unhealthy containers (was:
Datanode decommissioning blocked due to unhealthy container)
> Improve Decommission log message for unhealthy containers
> ---------------------------------------------------------
>
> Key: HDDS-8179
> URL: https://issues.apache.org/jira/browse/HDDS-8179
> Project: Apache Ozone
> Issue Type: Bug
> Components: ECOfflineRecovery, SCM
> Reporter: Varsha Ravi
> Assignee: Stephen O'Donnell
> Priority: Major
>
> The Replication Manager is sending delete container command to a non-empty
> container due to HDDS-7775. The container is not deleted but the *subsequent
> decommissioning calls to any of the DNs is not completing* because the
> container is in under-replicated as well as unhealthy state.
> *SCM.log:*
> {noformat}
> 2023-03-14 21:53:26,413 INFO
> org.apache.hadoop.hdds.scm.container.replication.ReplicationManager: Sending
> command [deleteContainerCommand: containerID: 15019, replicaIndex: 1, force:
> false] for container ContainerInfo{id=#15019, state=DELETING,
> pipelineID=PipelineID=e3fb8629-89ee-472a-9c43-3962629bd7a9,
> stateEnterTime=2023-03-14T19:17:07.315Z, owner=om2} to
> 1ca038f8-c505-47ca-b701-d542b85bb75b
> 2023-03-14 21:53:26,413 INFO
> org.apache.hadoop.hdds.scm.container.replication.ReplicationManager: Sending
> command [deleteContainerCommand: containerID: 15019, replicaIndex: 5, force:
> false] for container ContainerInfo{id=#15019, state=DELETING,
> pipelineID=PipelineID=e3fb8629-89ee-472a-9c43-3962629bd7a9,
> stateEnterTime=2023-03-14T19:17:07.315Z, owner=om2} to
> 1ac8e090-7eb7-4dab-93b7-97e4845f7b49
> 2023-03-14 23:19:12,206 INFO
> org.apache.hadoop.hdds.scm.container.replication.ReplicationManager: Sending
> command [deleteContainerCommand: containerID: 15019, replicaIndex: 3, force:
> false] for container ContainerInfo{id=#15019, state=DELETING,
> pipelineID=PipelineID=e3fb8629-89ee-472a-9c43-3962629bd7a9,
> stateEnterTime=2023-03-14T19:17:07.315Z, owner=om2} to
> c5c3948e-1296-4313-8c4e-9e6e50424280
> 2023-03-14 23:19:53,296 INFO
> org.apache.hadoop.hdds.scm.node.NodeDecommissionManager: Starting
> Decommission for node c5c3948e-1296-4313-8c4e-9e6e50424280
> 2023-03-14 23:22:38,512 INFO
> org.apache.hadoop.hdds.scm.node.DatanodeAdminMonitorImpl: Under Replicated
> Container #15019
> org.apache.hadoop.hdds.scm.container.replication.ECContainerReplicaCount@2bd10f2f;
> Replicas{
> ContainerReplica{containerID=#15019, state=CLOSED,
> datanodeDetails=ba62c66a-a342-4147-8344-3ce91726c2dc,
> placeOfBirth=ba62c66a-a342-4147-8344-3ce91726c2dc, sequenceId=0, keyCount=1,
> bytesUsed=102400,replicaIndex=5},
> ContainerReplica{containerID=#15019, state=CLOSED,
> datanodeDetails=15af7526-8376-45c4-97a5-7a74b7abc678,
> placeOfBirth=15af7526-8376-45c4-97a5-7a74b7abc678, sequenceId=0, keyCount=1,
> bytesUsed=102400,replicaIndex=4},
> ContainerReplica{containerID=#15019, state=CLOSED,
> datanodeDetails=1ca038f8-c505-47ca-b701-d542b85bb75b,
> placeOfBirth=1ca038f8-c505-47ca-b701-d542b85bb75b, sequenceId=0, keyCount=1,
> bytesUsed=102400,replicaIndex=1},
> ContainerReplica{containerID=#15019, state=CLOSED,
> datanodeDetails=c5c3948e-1296-4313-8c4e-9e6e50424280,
> placeOfBirth=c5c3948e-1296-4313-8c4e-9e6e50424280, sequenceId=0, keyCount=1,
> bytesUsed=102400,replicaIndex=3},
> ContainerReplica{containerID=#15019, state=CLOSED,
> datanodeDetails=f689fc55-e0e3-4785-9f2a-f799e18f0578,
> placeOfBirth=f689fc55-e0e3-4785-9f2a-f799e18f0578, sequenceId=0, keyCount=1,
> bytesUsed=102400,replicaIndex=1},
> ContainerReplica{containerID=#15019, state=CLOSED,
> datanodeDetails=1ac8e090-7eb7-4dab-93b7-97e4845f7b49,
> placeOfBirth=1ac8e090-7eb7-4dab-93b7-97e4845f7b49, sequenceId=0, keyCount=1,
> bytesUsed=102400,replicaIndex=5}}
> 2023-03-14 23:22:38,512 INFO
> org.apache.hadoop.hdds.scm.node.DatanodeAdminMonitorImpl: Unhealthy Container
> #15019
> org.apache.hadoop.hdds.scm.container.replication.ECContainerReplicaCount@2bd10f2f;
> Replicas{
> ContainerReplica{containerID=#15019, state=CLOSED,
> datanodeDetails=ba62c66a-a342-4147-8344-3ce91726c2dc,
> placeOfBirth=ba62c66a-a342-4147-8344-3ce91726c2dc, sequenceId=0, keyCount=1,
> bytesUsed=102400,replicaIndex=5},
> ContainerReplica{containerID=#15019, state=CLOSED,
> datanodeDetails=15af7526-8376-45c4-97a5-7a74b7abc678,
> placeOfBirth=15af7526-8376-45c4-97a5-7a74b7abc678, sequenceId=0, keyCount=1,
> bytesUsed=102400,replicaIndex=4},
> ContainerReplica{containerID=#15019, state=CLOSED,
> datanodeDetails=1ca038f8-c505-47ca-b701-d542b85bb75b,
> placeOfBirth=1ca038f8-c505-47ca-b701-d542b85bb75b, sequenceId=0, keyCount=1,
> bytesUsed=102400,replicaIndex=1},
> ContainerReplica{containerID=#15019, state=CLOSED,
> datanodeDetails=c5c3948e-1296-4313-8c4e-9e6e50424280,
> placeOfBirth=c5c3948e-1296-4313-8c4e-9e6e50424280, sequenceId=0, keyCount=1,
> bytesUsed=102400,replicaIndex=3},
> ContainerReplica{containerID=#15019, state=CLOSED,
> datanodeDetails=f689fc55-e0e3-4785-9f2a-f799e18f0578,
> placeOfBirth=f689fc55-e0e3-4785-9f2a-f799e18f0578, sequenceId=0, keyCount=1,
> bytesUsed=102400,replicaIndex=1},
> ContainerReplica{containerID=#15019, state=CLOSED,
> datanodeDetails=1ac8e090-7eb7-4dab-93b7-97e4845f7b49,
> placeOfBirth=1ac8e090-7eb7-4dab-93b7-97e4845f7b49, sequenceId=0, keyCount=1,
> bytesUsed=102400,replicaIndex=5}}
> 2023-03-14 23:22:38,512 INFO
> org.apache.hadoop.hdds.scm.node.DatanodeAdminMonitorImpl:
> c5c3948e-1296-4313-8c4e-9e6e50424280 has 60 sufficientlyReplicated, 1
> underReplicated and 1 unhealthy containers{noformat}
> *DN.log:*
> {noformat}
> 2023-03-14 21:53:32,032 ERROR
> org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler: Received
> container deletion command for container 15019 but the container is not empty
> with blockCount 1
> 2023-03-14 21:53:32,035 ERROR
> org.apache.hadoop.ozone.container.common.statemachine.commandhandler.DeleteContainerCommandHandler:
> Exception occurred while deleting the container.
> org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException:
> Non-force deletion of non-empty container is not allowed.
> at
> org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.deleteInternal(KeyValueHandler.java:1303)
> at
> org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.deleteContainer(KeyValueHandler.java:1160)
> at
> org.apache.hadoop.ozone.container.ozoneimpl.ContainerController.deleteContainer(ContainerController.java:182)
> at
> org.apache.hadoop.ozone.container.common.statemachine.commandhandler.DeleteContainerCommandHandler.handleInternal(DeleteContainerCommandHandler.java:108)
> at
> org.apache.hadoop.ozone.container.common.statemachine.commandhandler.DeleteContainerCommandHandler.lambda$handle$0(DeleteContainerCommandHandler.java:78)
> at
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
> at
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
> at java.base/java.lang.Thread.run(Thread.java:834){noformat}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]