Varsha Ravi created HDDS-7533:
---------------------------------

             Summary: Intermittent failure in Decommissioning Ozone Datanode
                 Key: HDDS-7533
                 URL: https://issues.apache.org/jira/browse/HDDS-7533
             Project: Apache Ozone
          Issue Type: Bug
          Components: Ozone Datanode
            Reporter: Varsha Ravi


Ozone decommission of datanode is stuck and does not complete even after hours.

STEPS TO REPRODUCE:
---------------------------
 # start only 3 DNs.
 # create non-EC directory and write significant data in it
 # shutdown these 3 DNs.
 # Start other set DNs for writing EC data.
 # Create EC directory and write significant data in it.
 # Start 1 DN from 1st set of 3 DNs.
 # Decommission 2 DNs from other set of EC DNs

SCM logs when decommissioning is stuck
{noformat}
4:58:30.828 PM    ERROR    UnderReplicatedProcessor    
Error processing under replicated container ContainerInfo{id=#4, state=CLOSED, 
pipelineID=PipelineID=e0019753-3738-473b-96b5-2338ce586a18, 
stateEnterTime=2022-11-18T10:33:33.353Z, owner=om2}
org.apache.hadoop.hdds.scm.exceptions.SCMException: Not enough healthy nodes to 
allocate container. 2  datanodes required. Found 1
    at 
org.apache.hadoop.hdds.scm.SCMCommonPlacementPolicy.chooseDatanodesInternal(SCMCommonPlacementPolicy.java:218)
    at 
org.apache.hadoop.hdds.scm.container.placement.algorithms.SCMContainerPlacementRandom.chooseDatanodesInternal(SCMContainerPlacementRandom.java:81)
    at 
org.apache.hadoop.hdds.scm.SCMCommonPlacementPolicy.chooseDatanodes(SCMCommonPlacementPolicy.java:175)
    at 
org.apache.hadoop.hdds.scm.SCMCommonPlacementPolicy.chooseDatanodes(SCMCommonPlacementPolicy.java:117)
    at 
org.apache.hadoop.hdds.scm.container.replication.ECUnderReplicationHandler.getTargetDatanodes(ECUnderReplicationHandler.java:303)
    at 
org.apache.hadoop.hdds.scm.container.replication.ECUnderReplicationHandler.processAndCreateCommands(ECUnderReplicationHandler.java:186)
    at 
org.apache.hadoop.hdds.scm.container.replication.ReplicationManager.processUnderReplicatedContainer(ReplicationManager.java:471)
    at 
org.apache.hadoop.hdds.scm.container.replication.UnderReplicatedProcessor.processContainer(UnderReplicatedProcessor.java:99)
    at 
org.apache.hadoop.hdds.scm.container.replication.UnderReplicatedProcessor.processAll(UnderReplicatedProcessor.java:83)
    at 
org.apache.hadoop.hdds.scm.container.replication.UnderReplicatedProcessor.run(UnderReplicatedProcessor.java:138)
    at java.base/java.lang.Thread.run(Thread.java:834)
4:58:30.829 PM    ERROR    SCMCommonPlacementPolicy    
Not enough healthy nodes to allocate container. 2  datanodes required. Found 
1{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to