Varsha Ravi created HDDS-11380:
----------------------------------
Summary: Decommissioning of DN fails immediately when network
topology is enabledd
Key: HDDS-11380
URL: https://issues.apache.org/jira/browse/HDDS-11380
Project: Apache Ozone
Issue Type: Bug
Components: DN
Reporter: Varsha Ravi
Decommission of DN fails immediately with the error *Insufficient nodes* when
network topology is enabled.
The cluster has 9 DNs spread across 5 racks.
{noformat}
Error: AllHosts: Insufficient nodes. Tried to decommission 1 nodes of which 1
nodes were valid. Cluster has 3 IN-SERVICE nodes, 3 of which are required for
minimum replication.
java.io.IOException: Some nodes could not enter the decommission workflow
at
org.apache.hadoop.hdds.scm.cli.datanode.DecommissionSubCommand.execute(DecommissionSubCommand.java:80)
at
org.apache.hadoop.hdds.scm.cli.ScmSubcommand.call(ScmSubcommand.java:39)
at
org.apache.hadoop.hdds.scm.cli.ScmSubcommand.call(ScmSubcommand.java:29)
at picocli.CommandLine.executeUserObject(CommandLine.java:2045)
at picocli.CommandLine.access$1500(CommandLine.java:148)
at
picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2465)
at picocli.CommandLine$RunLast.handle(CommandLine.java:2457)
at picocli.CommandLine$RunLast.handle(CommandLine.java:2419)
at
picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2277)
at picocli.CommandLine$RunLast.execute(CommandLine.java:2421)
at picocli.CommandLine.execute(CommandLine.java:2174)
at org.apache.hadoop.hdds.cli.GenericCli.execute(GenericCli.java:100)
at
org.apache.hadoop.hdds.cli.OzoneAdmin.lambda$execute$0(OzoneAdmin.java:80)
at
org.apache.hadoop.hdds.tracing.TracingUtil.executeInSpan(TracingUtil.java:169)
at
org.apache.hadoop.hdds.tracing.TracingUtil.executeInNewSpan(TracingUtil.java:159)
at org.apache.hadoop.hdds.cli.OzoneAdmin.execute(OzoneAdmin.java:79)
at org.apache.hadoop.hdds.cli.GenericCli.run(GenericCli.java:91)
at
org.apache.hadoop.hdds.cli.OzoneAdmin.main(OzoneAdmin.java:72){noformat}
*Topology details:*
{noformat}
State = HEALTHY
DN5:HTTPS=9883,CLIENT_RPC=19864,REPLICATION=9886,RATIS=9858,RATIS_ADMIN=9857,RATIS_SERVER=9856,STANDALONE=9859
IN_SERVICE /rack_cu31u
DN1:HTTPS=9883,CLIENT_RPC=19864,REPLICATION=9886,RATIS=9858,RATIS_ADMIN=9857,RATIS_SERVER=9856,STANDALONE=9859
IN_SERVICE /rack_cu31u
DN4:HTTPS=9883,CLIENT_RPC=19864,REPLICATION=9886,RATIS=9858,RATIS_ADMIN=9857,RATIS_SERVER=9856,STANDALONE=9859
IN_SERVICE /rack_cu31u
DN8:HTTPS=9883,CLIENT_RPC=19864,REPLICATION=9886,RATIS=9858,RATIS_ADMIN=9857,RATIS_SERVER=9856,STANDALONE=9859
IN_SERVICE /rack_co159
DN2:HTTPS=9883,CLIENT_RPC=19864,REPLICATION=9886,RATIS=9858,RATIS_ADMIN=9857,RATIS_SERVER=9856,STANDALONE=9859
IN_SERVICE /rack_co159
DN9:HTTPS=9883,CLIENT_RPC=19864,REPLICATION=9886,RATIS=9858,RATIS_ADMIN=9857,RATIS_SERVER=9856,STANDALONE=9859
IN_SERVICE /rack_co159
DN6:HTTPS=9883,CLIENT_RPC=19864,REPLICATION=9886,RATIS=9858,RATIS_ADMIN=9857,RATIS_SERVER=9856,STANDALONE=9859
IN_SERVICE /rack_hhbkg
DN7:HTTPS=9883,CLIENT_RPC=19864,REPLICATION=9886,RATIS=9858,RATIS_ADMIN=9857,RATIS_SERVER=9856,STANDALONE=9859
IN_SERVICE /rack_eyj9h
DN3:HTTPS=9883,CLIENT_RPC=19864,REPLICATION=9886,RATIS=9858,RATIS_ADMIN=9857,RATIS_SERVER=9856,STANDALONE=9859
IN_SERVICE /rack_eka3e{noformat}
DN to be decommissioned: DN5
This might be due to the improvement done as part of HDDS-10462
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]