[
https://issues.apache.org/jira/browse/HDDS-11380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17879151#comment-17879151
]
Stephen O'Donnell commented on HDDS-11380:
------------------------------------------
Something like this makes sense to me:
> Tried to decommission X out of Y IN_Service Nodes. Cannot decommission as a
> minimum of ? IN_SERVICE nodes are required to maintain replication.
If there are some invalid nodes passed, another message could be printed before
it as they are checked, eg:
> decommission of X is invalid because ...
> decommission of Y is invalid because ...
> Tried to decommission X out of Y IN_Service Nodes. Cannot decommission as a
> minimum of ? IN_SERVICE nodes are required to maintain replication.
> Error message when DN decommissioning fails early needs to be more
> comprehensive
> --------------------------------------------------------------------------------
>
> Key: HDDS-11380
> URL: https://issues.apache.org/jira/browse/HDDS-11380
> Project: Apache Ozone
> Issue Type: Improvement
> Components: DN
> Reporter: Varsha Ravi
> Assignee: Varsha Ravi
> Priority: Minor
> Labels: pull-request-available
>
> Decommission of DN fails immediately with the error *Insufficient nodes* when
> network topology is enabled.
> The cluster has 9 DNs spread across 5 racks.
> {noformat}
> Error: AllHosts: Insufficient nodes. Tried to decommission 1 nodes of which 1
> nodes were valid. Cluster has 3 IN-SERVICE nodes, 3 of which are required for
> minimum replication.
> java.io.IOException: Some nodes could not enter the decommission workflow
> at
> org.apache.hadoop.hdds.scm.cli.datanode.DecommissionSubCommand.execute(DecommissionSubCommand.java:80)
> at
> org.apache.hadoop.hdds.scm.cli.ScmSubcommand.call(ScmSubcommand.java:39)
> at
> org.apache.hadoop.hdds.scm.cli.ScmSubcommand.call(ScmSubcommand.java:29)
> at picocli.CommandLine.executeUserObject(CommandLine.java:2045)
> at picocli.CommandLine.access$1500(CommandLine.java:148)
> at
> picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2465)
> at picocli.CommandLine$RunLast.handle(CommandLine.java:2457)
> at picocli.CommandLine$RunLast.handle(CommandLine.java:2419)
> at
> picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2277)
> at picocli.CommandLine$RunLast.execute(CommandLine.java:2421)
> at picocli.CommandLine.execute(CommandLine.java:2174)
> at org.apache.hadoop.hdds.cli.GenericCli.execute(GenericCli.java:100)
> at
> org.apache.hadoop.hdds.cli.OzoneAdmin.lambda$execute$0(OzoneAdmin.java:80)
> at
> org.apache.hadoop.hdds.tracing.TracingUtil.executeInSpan(TracingUtil.java:169)
> at
> org.apache.hadoop.hdds.tracing.TracingUtil.executeInNewSpan(TracingUtil.java:159)
> at org.apache.hadoop.hdds.cli.OzoneAdmin.execute(OzoneAdmin.java:79)
> at org.apache.hadoop.hdds.cli.GenericCli.run(GenericCli.java:91)
> at
> org.apache.hadoop.hdds.cli.OzoneAdmin.main(OzoneAdmin.java:72){noformat}
> *Topology details:*
> {noformat}
> State = HEALTHY
>
> DN5:HTTPS=9883,CLIENT_RPC=19864,REPLICATION=9886,RATIS=9858,RATIS_ADMIN=9857,RATIS_SERVER=9856,STANDALONE=9859
> IN_SERVICE /rack_cu31u
>
> DN1:HTTPS=9883,CLIENT_RPC=19864,REPLICATION=9886,RATIS=9858,RATIS_ADMIN=9857,RATIS_SERVER=9856,STANDALONE=9859
> IN_SERVICE /rack_cu31u
>
> DN4:HTTPS=9883,CLIENT_RPC=19864,REPLICATION=9886,RATIS=9858,RATIS_ADMIN=9857,RATIS_SERVER=9856,STANDALONE=9859
> IN_SERVICE /rack_cu31u
>
> DN8:HTTPS=9883,CLIENT_RPC=19864,REPLICATION=9886,RATIS=9858,RATIS_ADMIN=9857,RATIS_SERVER=9856,STANDALONE=9859
> IN_SERVICE /rack_co159
>
> DN2:HTTPS=9883,CLIENT_RPC=19864,REPLICATION=9886,RATIS=9858,RATIS_ADMIN=9857,RATIS_SERVER=9856,STANDALONE=9859
> IN_SERVICE /rack_co159
>
> DN9:HTTPS=9883,CLIENT_RPC=19864,REPLICATION=9886,RATIS=9858,RATIS_ADMIN=9857,RATIS_SERVER=9856,STANDALONE=9859
> IN_SERVICE /rack_co159
>
> DN6:HTTPS=9883,CLIENT_RPC=19864,REPLICATION=9886,RATIS=9858,RATIS_ADMIN=9857,RATIS_SERVER=9856,STANDALONE=9859
> IN_SERVICE /rack_hhbkg
>
> DN7:HTTPS=9883,CLIENT_RPC=19864,REPLICATION=9886,RATIS=9858,RATIS_ADMIN=9857,RATIS_SERVER=9856,STANDALONE=9859
> IN_SERVICE /rack_eyj9h
>
> DN3:HTTPS=9883,CLIENT_RPC=19864,REPLICATION=9886,RATIS=9858,RATIS_ADMIN=9857,RATIS_SERVER=9856,STANDALONE=9859
> IN_SERVICE /rack_eka3e{noformat}
> DN to be decommissioned: DN5
> This might be due to the improvement done as part of HDDS-10462
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]