[
https://issues.apache.org/jira/browse/HDDS-7972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17717414#comment-17717414
]
Neil Joshi commented on HDDS-7972:
----------------------------------
Failed nodes need to be decommissioned to remove from ratis ring shown in ozone
admin commands.
To resolve the ratis roles to display the ring after node failure, the node
must be decommissioned. As part of the node decommission process, the failed
node is removed from the ratis ring. For the scm,
if scmid ad792626-4def-4ee6-a18c-3d073074635d is down run scm decommission
command of HDDS-8365,
ozone admin scm decommission --nodeid=ad792626-4def-4ee6-a18c-3d073074635d
This results in removing the down scm from the ring and calls to ozone admin
scm roles result in:
ozone admin scm roles
c0104.halxg.c.com:9894:FOLLOWER:44711271-927f-4dd5-be58-796df7033fe3:10.17.207.14
c0105.halxg.c.com:9894:LEADER:77b1f063-d625-476f-a420-45b0f3ffe30c:10.17.207.15
The same is true for OzoneManagers in HA, with failed om with nodeid om1 the
failed node is still found in the output of roles command:
ozone admin om roles :
bash-4.2$ ozone admin om roles --service-id=omservice
om1 : FOLLOWER (om1) om2 : FOLLOWER (om2) om3 : LEADER (om3)
Once failed node, om1 is decommissioned,
{code:java}
bash-4.2$ ozone admin om decommission -id=omservice -nodeid=om1 -hostname=om1
--force{code}
calls to om roles shows the failed removed from the ring:
{code:java}
bash-4.2$ ozone admin om roles --service-id=omservice
om2 : FOLLOWER (om2)
om3 : LEADER (om3){code}
> ozone admin scm roles reports down SCM as follower
> --------------------------------------------------
>
> Key: HDDS-7972
> URL: https://issues.apache.org/jira/browse/HDDS-7972
> Project: Apache Ozone
> Issue Type: Improvement
> Components: Ozone CLI, SCM
> Reporter: Ritesh Shukla
> Priority: Minor
>
> When an SCM is down the ozone cli still reports it as a follower when quering
> ozone scm roles. Example: c0134.halxg.ca.com is down when this command is run
> {code:java}
> ozone admin scm roles
> c0104.halxg.c.com:9894:FOLLOWER:44711271-927f-4dd5-be58-796df7033fe3:10.17.207.14
> c0105.halxg.c.com:9894:LEADER:77b1f063-d625-476f-a420-45b0f3ffe30c:10.17.207.15
> c0134.halxg.ca.com:9894:FOLLOWER:ad792626-4def-4ee6-a18c-3d073074635d:10.17.207.44{code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]