[ 
https://issues.apache.org/jira/browse/HDDS-4740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Glen Geng reassigned HDDS-4740:
-------------------------------

    Assignee: Glen Geng

> admin command should be regardless of leadership of SCM.
> --------------------------------------------------------
>
>                 Key: HDDS-4740
>                 URL: https://issues.apache.org/jira/browse/HDDS-4740
>             Project: Apache Ozone
>          Issue Type: Sub-task
>          Components: SCM HA
>    Affects Versions: 1.1.0
>            Reporter: Glen Geng
>            Assignee: Glen Geng
>            Priority: Major
>
> *Scope*
> admin command includes rm start/stop and safe mode exit.
>  
> *Requirement*
> 1, When admin stops rm, rm in all SCM should stop, re-election should not 
> trigger rm to start in the new leader.
> 2, When admin starts rm, only rm in leader and out of safe mode should take 
> effect. Given leader is in safe mode, even if admin starts rm explicitly, it 
> does not take effect.
> 3, This admin rm start/stop can not survive restart for a SCM instance. When 
> admin decides to stop rm of the SCM cluster, he should pay attention if any 
> of the SCM crashes.
>  
> *Status*
> 1, For now, admin rm start/stop will create/destroy the rm thread.
> 2, SCMContainerLocationFailoverProxyProvider has been proxied by 
> FailoverProxyProvider, it will round robin SCMs in ozone.scm.names, until it 
> is successfully handled. In ServerSide, whenever receiving a client request, 
> it do isLeader check first, return nle to trigger fpp to failover to the next 
> SCM.
> 3, SCMService decides the next iteration of rm to take effect or not by 
> changing RUNNING and PAUSING.
>  
> *Solution:*
> When receiving a rm stop/start request on the server side, SCM skip the 
> isLeader check, just destroys/creates rm thread, client side fake an 
> exception to trigger fpp to try the next SCM in a round robin way.
> The Running and PAUSING status and rm start/stop can be treated separately. 
> The admin operations and the raft status are requirements of two dimensions.
>  
> *We can achieve above requirements:*
> 1, When admin stops rm, rm in all SCM should stop, re-election should not 
> trigger rm to start in the new leader.
> Meet, admin rm start destroy rm thread in all SCM.
>  
> 2, When admin starts rm, only rm in leader and out of safe mode should take 
> effect. Given leader is in safe mode, even if admin starts it explicitly, rm 
> does not take effect.
> Meet, admin rm stop create rm thread in all SCM, but SCMStatus is decided by 
> leader and safe mode.
>  
> 3, This admin rm start/stop can not survive restart for a SCM instance. When 
> admin decides to stop rm of the SCM cluster, he should pay attention if any 
> of the SCM crashes.
> Meet. The is actually a relax item. (edited)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to