[ 
https://issues.apache.org/jira/browse/HDDS-7972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17717414#comment-17717414
 ] 

Neil Joshi commented on HDDS-7972:
----------------------------------

Failed nodes need to be decommissioned to remove from ratis ring shown in ozone 
admin commands.

 

To resolve the ratis roles to display the ring after node failure, the node 
must be decommissioned.  As part of the node decommission process, the failed 
node is removed from the ratis ring.  For the scm,

if scmid ad792626-4def-4ee6-a18c-3d073074635d is down run scm decommission 
command of HDDS-8365,

ozone admin scm decommission --nodeid=ad792626-4def-4ee6-a18c-3d073074635d

This results in removing the down scm from the ring and calls to ozone admin 
scm roles result in:
ozone admin scm roles 
c0104.halxg.c.com:9894:FOLLOWER:44711271-927f-4dd5-be58-796df7033fe3:10.17.207.14
c0105.halxg.c.com:9894:LEADER:77b1f063-d625-476f-a420-45b0f3ffe30c:10.17.207.15
 

The same is true for OzoneManagers in HA, with failed om with nodeid om1 the 
failed node is still found in the output of roles command:

ozone admin om roles :
bash-4.2$ ozone admin om roles --service-id=omservice
 om1 : FOLLOWER (om1) om2 : FOLLOWER (om2) om3 : LEADER (om3) 
Once failed node, om1 is decommissioned,
{code:java}
bash-4.2$ ozone admin om decommission -id=omservice -nodeid=om1 -hostname=om1 
--force{code}
calls to om roles shows the failed removed from the ring:
{code:java}
bash-4.2$ ozone admin om roles --service-id=omservice
om2 : FOLLOWER (om2)
om3 : LEADER (om3){code}
 

> ozone admin scm roles reports down SCM as follower
> --------------------------------------------------
>
>                 Key: HDDS-7972
>                 URL: https://issues.apache.org/jira/browse/HDDS-7972
>             Project: Apache Ozone
>          Issue Type: Improvement
>          Components: Ozone CLI, SCM
>            Reporter: Ritesh Shukla
>            Priority: Minor
>
> When an SCM is down the ozone cli still reports it as a follower when quering 
> ozone scm roles. Example: c0134.halxg.ca.com is down when this command is run
> {code:java}
> ozone admin scm roles 
> c0104.halxg.c.com:9894:FOLLOWER:44711271-927f-4dd5-be58-796df7033fe3:10.17.207.14
> c0105.halxg.c.com:9894:LEADER:77b1f063-d625-476f-a420-45b0f3ffe30c:10.17.207.15
> c0134.halxg.ca.com:9894:FOLLOWER:ad792626-4def-4ee6-a18c-3d073074635d:10.17.207.44{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to