[
https://issues.apache.org/jira/browse/HDDS-4740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Glen Geng reassigned HDDS-4740:
-------------------------------
Assignee: Glen Geng
> admin command should be regardless of leadership of SCM.
> --------------------------------------------------------
>
> Key: HDDS-4740
> URL: https://issues.apache.org/jira/browse/HDDS-4740
> Project: Apache Ozone
> Issue Type: Sub-task
> Components: SCM HA
> Affects Versions: 1.1.0
> Reporter: Glen Geng
> Assignee: Glen Geng
> Priority: Major
>
> *Scope*
> admin command includes rm start/stop and safe mode exit.
>
> *Requirement*
> 1, When admin stops rm, rm in all SCM should stop, re-election should not
> trigger rm to start in the new leader.
> 2, When admin starts rm, only rm in leader and out of safe mode should take
> effect. Given leader is in safe mode, even if admin starts rm explicitly, it
> does not take effect.
> 3, This admin rm start/stop can not survive restart for a SCM instance. When
> admin decides to stop rm of the SCM cluster, he should pay attention if any
> of the SCM crashes.
>
> *Status*
> 1, For now, admin rm start/stop will create/destroy the rm thread.
> 2, SCMContainerLocationFailoverProxyProvider has been proxied by
> FailoverProxyProvider, it will round robin SCMs in ozone.scm.names, until it
> is successfully handled. In ServerSide, whenever receiving a client request,
> it do isLeader check first, return nle to trigger fpp to failover to the next
> SCM.
> 3, SCMService decides the next iteration of rm to take effect or not by
> changing RUNNING and PAUSING.
>
> *Solution:*
> When receiving a rm stop/start request on the server side, SCM skip the
> isLeader check, just destroys/creates rm thread, client side fake an
> exception to trigger fpp to try the next SCM in a round robin way.
> The Running and PAUSING status and rm start/stop can be treated separately.
> The admin operations and the raft status are requirements of two dimensions.
>
> *We can achieve above requirements:*
> 1, When admin stops rm, rm in all SCM should stop, re-election should not
> trigger rm to start in the new leader.
> Meet, admin rm start destroy rm thread in all SCM.
>
> 2, When admin starts rm, only rm in leader and out of safe mode should take
> effect. Given leader is in safe mode, even if admin starts it explicitly, rm
> does not take effect.
> Meet, admin rm stop create rm thread in all SCM, but SCMStatus is decided by
> leader and safe mode.
>
> 3, This admin rm start/stop can not survive restart for a SCM instance. When
> admin decides to stop rm of the SCM cluster, he should pay attention if any
> of the SCM crashes.
> Meet. The is actually a relax item. (edited)
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]