Varsha Ravi created HDDS-7533:
---------------------------------
Summary: Intermittent failure in Decommissioning Ozone Datanode
Key: HDDS-7533
URL: https://issues.apache.org/jira/browse/HDDS-7533
Project: Apache Ozone
Issue Type: Bug
Components: Ozone Datanode
Reporter: Varsha Ravi
Ozone decommission of datanode is stuck and does not complete even after hours.
STEPS TO REPRODUCE:
---------------------------
# start only 3 DNs.
# create non-EC directory and write significant data in it
# shutdown these 3 DNs.
# Start other set DNs for writing EC data.
# Create EC directory and write significant data in it.
# Start 1 DN from 1st set of 3 DNs.
# Decommission 2 DNs from other set of EC DNs
SCM logs when decommissioning is stuck
{noformat}
4:58:30.828 PM ERROR UnderReplicatedProcessor
Error processing under replicated container ContainerInfo{id=#4, state=CLOSED,
pipelineID=PipelineID=e0019753-3738-473b-96b5-2338ce586a18,
stateEnterTime=2022-11-18T10:33:33.353Z, owner=om2}
org.apache.hadoop.hdds.scm.exceptions.SCMException: Not enough healthy nodes to
allocate container. 2 datanodes required. Found 1
at
org.apache.hadoop.hdds.scm.SCMCommonPlacementPolicy.chooseDatanodesInternal(SCMCommonPlacementPolicy.java:218)
at
org.apache.hadoop.hdds.scm.container.placement.algorithms.SCMContainerPlacementRandom.chooseDatanodesInternal(SCMContainerPlacementRandom.java:81)
at
org.apache.hadoop.hdds.scm.SCMCommonPlacementPolicy.chooseDatanodes(SCMCommonPlacementPolicy.java:175)
at
org.apache.hadoop.hdds.scm.SCMCommonPlacementPolicy.chooseDatanodes(SCMCommonPlacementPolicy.java:117)
at
org.apache.hadoop.hdds.scm.container.replication.ECUnderReplicationHandler.getTargetDatanodes(ECUnderReplicationHandler.java:303)
at
org.apache.hadoop.hdds.scm.container.replication.ECUnderReplicationHandler.processAndCreateCommands(ECUnderReplicationHandler.java:186)
at
org.apache.hadoop.hdds.scm.container.replication.ReplicationManager.processUnderReplicatedContainer(ReplicationManager.java:471)
at
org.apache.hadoop.hdds.scm.container.replication.UnderReplicatedProcessor.processContainer(UnderReplicatedProcessor.java:99)
at
org.apache.hadoop.hdds.scm.container.replication.UnderReplicatedProcessor.processAll(UnderReplicatedProcessor.java:83)
at
org.apache.hadoop.hdds.scm.container.replication.UnderReplicatedProcessor.run(UnderReplicatedProcessor.java:138)
at java.base/java.lang.Thread.run(Thread.java:834)
4:58:30.829 PM ERROR SCMCommonPlacementPolicy
Not enough healthy nodes to allocate container. 2 datanodes required. Found
1{noformat}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]