[ 
https://issues.apache.org/jira/browse/HDDS-12109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17930913#comment-17930913
 ] 

Ivan Andika edited comment on HDDS-12109 at 2/27/25 2:17 AM:
-------------------------------------------------------------

> Do we have some epic ticket to track the stability improvement?

Not at the moment, we are working on preventing some issues that might be 
caused by admin operation error. Just letting know the relevant people working 
on this so they can provide inputs / review the patches.


was (Author: JIRAUSER298977):
> Do we have some epic ticket to track the stability improvement?

No, we are working on preventing some issues that might be caused by admin 
operation error. Just letting know the relevant people working on this so they 
can provide inputs / review the patches.
 
 

> Transfer leadership should not start until target SCM is out of safe mode
> -------------------------------------------------------------------------
>
>                 Key: HDDS-12109
>                 URL: https://issues.apache.org/jira/browse/HDDS-12109
>             Project: Apache Ozone
>          Issue Type: Sub-task
>            Reporter: Ivan Andika
>            Assignee: Peter Lee
>            Priority: Major
>
> We encountered an incident where an administrator restarted an SCM and 
> transfer leadership to it immediately while it's still in safe mode. The 
> leadership was transferred to the SCM in safe mode. 
> However, the new leader cannot serve any requests causing user write requests 
> to block until the new leader SCM is out of safe mode.
> We can add a mechanism to prevent transfer leadership if the target SCM is 
> still in safe mode. 
> This can be implemented on Ozone / Ratis side. For Ratis, the possible idea 
> is to add another StateMachine API that will check whether a follower is 
> ready for a leader transfer. However, I think adding a simple check of 
> scmClient#inSafeMode should suffice, but we need to change it such that 
> scmClient#inSafeMode won't be directed to leader.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to