[
https://issues.apache.org/jira/browse/HDDS-12109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ivan Andika updated HDDS-12109:
-------------------------------
Summary: Transfer leadership should not start until target SCM is out of
safe mode (was: Transfer leadership should not run until target SCM is out of
safe mode)
> Transfer leadership should not start until target SCM is out of safe mode
> -------------------------------------------------------------------------
>
> Key: HDDS-12109
> URL: https://issues.apache.org/jira/browse/HDDS-12109
> Project: Apache Ozone
> Issue Type: Sub-task
> Reporter: Ivan Andika
> Priority: Major
>
> We encountered an incident where an administrator restarted an SCM and
> transfer leadership to it immediately while it's still in safe mode. The
> leadership was transferred to the SCM in safe mode.
> However, the new leader cannot serve any requests causing user write requests
> to block until the new leader SCM is out of safe mode.
> We can add a mechanism to prevent transfer leadership if the target SCM is
> still in safe mode.
> This can be implemented on Ozone / Ratis side. For Ratis, the possible idea
> is to add another StateMachine API that will check whether a follower is
> ready for a leader transfer. However, I think adding a simple check of
> scmClient#inSafeMode should suffice.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]