Varsha Ravi created HDDS-8179:
---------------------------------
Summary: Replication Manager sending delete container command to a
non-empty container
Key: HDDS-8179
URL: https://issues.apache.org/jira/browse/HDDS-8179
Project: Apache Ozone
Issue Type: Bug
Components: ECOfflineRecovery, SCM
Reporter: Varsha Ravi
The Replication Manager is sending delete container command to a non-empty
container. The container is not deleted but the *subsequent decommissioning
calls to any of the DNs is not completing* because the container is in
under-replicated as well as unhealthy state.
*SCM.log:*
{noformat}
2023-03-14 21:53:26,413 INFO
org.apache.hadoop.hdds.scm.container.replication.ReplicationManager: Sending
command [deleteContainerCommand: containerID: 15019, replicaIndex: 1, force:
false] for container ContainerInfo{id=#15019, state=DELETING,
pipelineID=PipelineID=e3fb8629-89ee-472a-9c43-3962629bd7a9,
stateEnterTime=2023-03-14T19:17:07.315Z, owner=om2} to
1ca038f8-c505-47ca-b701-d542b85bb75b
2023-03-14 21:53:26,413 INFO
org.apache.hadoop.hdds.scm.container.replication.ReplicationManager: Sending
command [deleteContainerCommand: containerID: 15019, replicaIndex: 5, force:
false] for container ContainerInfo{id=#15019, state=DELETING,
pipelineID=PipelineID=e3fb8629-89ee-472a-9c43-3962629bd7a9,
stateEnterTime=2023-03-14T19:17:07.315Z, owner=om2} to
1ac8e090-7eb7-4dab-93b7-97e4845f7b49
2023-03-14 23:19:12,206 INFO
org.apache.hadoop.hdds.scm.container.replication.ReplicationManager: Sending
command [deleteContainerCommand: containerID: 15019, replicaIndex: 3, force:
false] for container ContainerInfo{id=#15019, state=DELETING,
pipelineID=PipelineID=e3fb8629-89ee-472a-9c43-3962629bd7a9,
stateEnterTime=2023-03-14T19:17:07.315Z, owner=om2} to
c5c3948e-1296-4313-8c4e-9e6e50424280
2023-03-14 23:19:53,296 INFO
org.apache.hadoop.hdds.scm.node.NodeDecommissionManager: Starting Decommission
for node c5c3948e-1296-4313-8c4e-9e6e50424280
2023-03-14 23:22:38,512 INFO
org.apache.hadoop.hdds.scm.node.DatanodeAdminMonitorImpl: Under Replicated
Container #15019
org.apache.hadoop.hdds.scm.container.replication.ECContainerReplicaCount@2bd10f2f;
Replicas{
ContainerReplica{containerID=#15019, state=CLOSED,
datanodeDetails=ba62c66a-a342-4147-8344-3ce91726c2dc,
placeOfBirth=ba62c66a-a342-4147-8344-3ce91726c2dc, sequenceId=0, keyCount=1,
bytesUsed=102400,replicaIndex=5},
ContainerReplica{containerID=#15019, state=CLOSED,
datanodeDetails=15af7526-8376-45c4-97a5-7a74b7abc678,
placeOfBirth=15af7526-8376-45c4-97a5-7a74b7abc678, sequenceId=0, keyCount=1,
bytesUsed=102400,replicaIndex=4},
ContainerReplica{containerID=#15019, state=CLOSED,
datanodeDetails=1ca038f8-c505-47ca-b701-d542b85bb75b,
placeOfBirth=1ca038f8-c505-47ca-b701-d542b85bb75b, sequenceId=0, keyCount=1,
bytesUsed=102400,replicaIndex=1},
ContainerReplica{containerID=#15019, state=CLOSED,
datanodeDetails=c5c3948e-1296-4313-8c4e-9e6e50424280,
placeOfBirth=c5c3948e-1296-4313-8c4e-9e6e50424280, sequenceId=0, keyCount=1,
bytesUsed=102400,replicaIndex=3},
ContainerReplica{containerID=#15019, state=CLOSED,
datanodeDetails=f689fc55-e0e3-4785-9f2a-f799e18f0578,
placeOfBirth=f689fc55-e0e3-4785-9f2a-f799e18f0578, sequenceId=0, keyCount=1,
bytesUsed=102400,replicaIndex=1},
ContainerReplica{containerID=#15019, state=CLOSED,
datanodeDetails=1ac8e090-7eb7-4dab-93b7-97e4845f7b49,
placeOfBirth=1ac8e090-7eb7-4dab-93b7-97e4845f7b49, sequenceId=0, keyCount=1,
bytesUsed=102400,replicaIndex=5}}
2023-03-14 23:22:38,512 INFO
org.apache.hadoop.hdds.scm.node.DatanodeAdminMonitorImpl: Unhealthy Container
#15019
org.apache.hadoop.hdds.scm.container.replication.ECContainerReplicaCount@2bd10f2f;
Replicas{
ContainerReplica{containerID=#15019, state=CLOSED,
datanodeDetails=ba62c66a-a342-4147-8344-3ce91726c2dc,
placeOfBirth=ba62c66a-a342-4147-8344-3ce91726c2dc, sequenceId=0, keyCount=1,
bytesUsed=102400,replicaIndex=5},
ContainerReplica{containerID=#15019, state=CLOSED,
datanodeDetails=15af7526-8376-45c4-97a5-7a74b7abc678,
placeOfBirth=15af7526-8376-45c4-97a5-7a74b7abc678, sequenceId=0, keyCount=1,
bytesUsed=102400,replicaIndex=4},
ContainerReplica{containerID=#15019, state=CLOSED,
datanodeDetails=1ca038f8-c505-47ca-b701-d542b85bb75b,
placeOfBirth=1ca038f8-c505-47ca-b701-d542b85bb75b, sequenceId=0, keyCount=1,
bytesUsed=102400,replicaIndex=1},
ContainerReplica{containerID=#15019, state=CLOSED,
datanodeDetails=c5c3948e-1296-4313-8c4e-9e6e50424280,
placeOfBirth=c5c3948e-1296-4313-8c4e-9e6e50424280, sequenceId=0, keyCount=1,
bytesUsed=102400,replicaIndex=3},
ContainerReplica{containerID=#15019, state=CLOSED,
datanodeDetails=f689fc55-e0e3-4785-9f2a-f799e18f0578,
placeOfBirth=f689fc55-e0e3-4785-9f2a-f799e18f0578, sequenceId=0, keyCount=1,
bytesUsed=102400,replicaIndex=1},
ContainerReplica{containerID=#15019, state=CLOSED,
datanodeDetails=1ac8e090-7eb7-4dab-93b7-97e4845f7b49,
placeOfBirth=1ac8e090-7eb7-4dab-93b7-97e4845f7b49, sequenceId=0, keyCount=1,
bytesUsed=102400,replicaIndex=5}}
2023-03-14 23:22:38,512 INFO
org.apache.hadoop.hdds.scm.node.DatanodeAdminMonitorImpl:
c5c3948e-1296-4313-8c4e-9e6e50424280 has 60 sufficientlyReplicated, 1
underReplicated and 1 unhealthy containers{noformat}
*DN.log:*
{noformat}
2023-03-14 21:53:32,032 ERROR
org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler: Received container
deletion command for container 15019 but the container is not empty with
blockCount 1
2023-03-14 21:53:32,035 ERROR
org.apache.hadoop.ozone.container.common.statemachine.commandhandler.DeleteContainerCommandHandler:
Exception occurred while deleting the container.
org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException:
Non-force deletion of non-empty container is not allowed.
at
org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.deleteInternal(KeyValueHandler.java:1303)
at
org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.deleteContainer(KeyValueHandler.java:1160)
at
org.apache.hadoop.ozone.container.ozoneimpl.ContainerController.deleteContainer(ContainerController.java:182)
at
org.apache.hadoop.ozone.container.common.statemachine.commandhandler.DeleteContainerCommandHandler.handleInternal(DeleteContainerCommandHandler.java:108)
at
org.apache.hadoop.ozone.container.common.statemachine.commandhandler.DeleteContainerCommandHandler.lambda$handle$0(DeleteContainerCommandHandler.java:78)
at
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834){noformat}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]