Varsha Ravi created HDDS-11380:
----------------------------------

             Summary: Decommissioning of DN fails immediately when network 
topology is enabledd
                 Key: HDDS-11380
                 URL: https://issues.apache.org/jira/browse/HDDS-11380
             Project: Apache Ozone
          Issue Type: Bug
          Components: DN
            Reporter: Varsha Ravi


Decommission of DN fails immediately with the error *Insufficient nodes* when 
network topology is enabled.

The cluster has 9 DNs spread across 5 racks.
{noformat}
Error: AllHosts: Insufficient nodes. Tried to decommission 1 nodes of which 1 
nodes were valid. Cluster has 3 IN-SERVICE nodes, 3 of which are required for 
minimum replication. 
java.io.IOException: Some nodes could not enter the decommission workflow
        at 
org.apache.hadoop.hdds.scm.cli.datanode.DecommissionSubCommand.execute(DecommissionSubCommand.java:80)
        at 
org.apache.hadoop.hdds.scm.cli.ScmSubcommand.call(ScmSubcommand.java:39)
        at 
org.apache.hadoop.hdds.scm.cli.ScmSubcommand.call(ScmSubcommand.java:29)
        at picocli.CommandLine.executeUserObject(CommandLine.java:2045)
        at picocli.CommandLine.access$1500(CommandLine.java:148)
        at 
picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2465)
        at picocli.CommandLine$RunLast.handle(CommandLine.java:2457)
        at picocli.CommandLine$RunLast.handle(CommandLine.java:2419)
        at 
picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2277)
        at picocli.CommandLine$RunLast.execute(CommandLine.java:2421)
        at picocli.CommandLine.execute(CommandLine.java:2174)
        at org.apache.hadoop.hdds.cli.GenericCli.execute(GenericCli.java:100)
        at 
org.apache.hadoop.hdds.cli.OzoneAdmin.lambda$execute$0(OzoneAdmin.java:80)
        at 
org.apache.hadoop.hdds.tracing.TracingUtil.executeInSpan(TracingUtil.java:169)
        at 
org.apache.hadoop.hdds.tracing.TracingUtil.executeInNewSpan(TracingUtil.java:159)
        at org.apache.hadoop.hdds.cli.OzoneAdmin.execute(OzoneAdmin.java:79)
        at org.apache.hadoop.hdds.cli.GenericCli.run(GenericCli.java:91)
        at 
org.apache.hadoop.hdds.cli.OzoneAdmin.main(OzoneAdmin.java:72){noformat}
*Topology details:*
{noformat}
State = HEALTHY
 
DN5:HTTPS=9883,CLIENT_RPC=19864,REPLICATION=9886,RATIS=9858,RATIS_ADMIN=9857,RATIS_SERVER=9856,STANDALONE=9859
    IN_SERVICE    /rack_cu31u
 
DN1:HTTPS=9883,CLIENT_RPC=19864,REPLICATION=9886,RATIS=9858,RATIS_ADMIN=9857,RATIS_SERVER=9856,STANDALONE=9859
    IN_SERVICE    /rack_cu31u
 
DN4:HTTPS=9883,CLIENT_RPC=19864,REPLICATION=9886,RATIS=9858,RATIS_ADMIN=9857,RATIS_SERVER=9856,STANDALONE=9859
    IN_SERVICE    /rack_cu31u
 
DN8:HTTPS=9883,CLIENT_RPC=19864,REPLICATION=9886,RATIS=9858,RATIS_ADMIN=9857,RATIS_SERVER=9856,STANDALONE=9859
    IN_SERVICE    /rack_co159
 
DN2:HTTPS=9883,CLIENT_RPC=19864,REPLICATION=9886,RATIS=9858,RATIS_ADMIN=9857,RATIS_SERVER=9856,STANDALONE=9859
    IN_SERVICE    /rack_co159
 
DN9:HTTPS=9883,CLIENT_RPC=19864,REPLICATION=9886,RATIS=9858,RATIS_ADMIN=9857,RATIS_SERVER=9856,STANDALONE=9859
    IN_SERVICE    /rack_co159
 
DN6:HTTPS=9883,CLIENT_RPC=19864,REPLICATION=9886,RATIS=9858,RATIS_ADMIN=9857,RATIS_SERVER=9856,STANDALONE=9859
    IN_SERVICE    /rack_hhbkg
 
DN7:HTTPS=9883,CLIENT_RPC=19864,REPLICATION=9886,RATIS=9858,RATIS_ADMIN=9857,RATIS_SERVER=9856,STANDALONE=9859
    IN_SERVICE    /rack_eyj9h
 
DN3:HTTPS=9883,CLIENT_RPC=19864,REPLICATION=9886,RATIS=9858,RATIS_ADMIN=9857,RATIS_SERVER=9856,STANDALONE=9859
    IN_SERVICE    /rack_eka3e{noformat}
DN to be decommissioned: DN5

This might be due to the improvement done as part of HDDS-10462



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to