bharatviswa504 edited a comment on pull request #2294: URL: https://github.com/apache/ozone/pull/2294#issuecomment-852885323
With this approach we see an issue Before restart, 2 pipelines closed, and let's say it removed and create a new pipeline. But in the SCM pipeline table it has old 2 pipelines, as remove/new pipeline are not persisted to DB as SCM is force killed. As we call refresh and validate we exit safe mode after 2nd pipeline remove transaction, and we validate pipeline rules for each applyTransaction so safemode pipeline rules will be validated, and we do not wait for all the pending transactions. In this case we come out of safemode early and reads/write might fail. This causes problems like reading/write will fail, even after SCM is out of safe mode. ``` 2021-06-02 05:51:04,208 INFO org.apache.hadoop.hdds.scm.safemode.HealthyPipelineSafeModeRule: Refreshed total pipeline count is 1, healthy pipeline threshold count is 1 2021-06-02 05:51:04,208 INFO org.apache.hadoop.hdds.scm.safemode.OneReplicaPipelineSafeModeRule: Total pipeline count is 1, pipeline's with at least one datanode reported threshold count is 1 2021-06-02 05:51:04,209 INFO org.apache.hadoop.hdds.scm.safemode.HealthyPipelineSafeModeRule: Refreshed total pipeline count is 0, healthy pipeline threshold count is 0 2021-06-02 05:51:04,209 INFO org.apache.hadoop.hdds.scm.safemode.SCMSafeModeManager: HealthyPipelineSafeModeRule rule is successfully validated 2021-06-02 05:51:04,209 INFO org.apache.hadoop.hdds.scm.safemode.OneReplicaPipelineSafeModeRule: Total pipeline count is 0, pipeline's with at least one datanode reported threshold count is 0 ``` After an offline discussion with @bshashikant 1. We thought we shall refresh SCM safe mode rule once after leader Ready on all SCMs. 2. And start DN RPC port only after leader ready, so that SCM does not come out of safe mode early by considering not upto date DB. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
