bshashikant opened a new pull request #2114:
URL: https://github.com/apache/ozone/pull/2114


   
   ## What changes were proposed in this pull request?
   IN SCM HA, the primary node starts up the ratis server while other 
bootstrapping nodes will get added to the ratis group. Now, if all the 
bootstrapping SCM's get stopped, the primary node will now step down from 
leadership as it will loose majority. If the bootstrapping nodes are now 
bootstrapped again,  the bootsrapping node will try to first validate the 
cluster id from the leader SCM with the persisted cluster id , but as there is 
no leader existing, bootstrapping wil keep on failing and retrying until it 
shuts down. 
   
   The issue can be very easily simulated in kubernetes deployments, where 
bootstrap and init cmds are run repeatedly on every restart.
   
   The Jira aims to bypass the cluster id validation if a bootstrapping node 
already has a cluster id.
   ## What is the link to the Apache JIRA
   https://issues.apache.org/jira/browse/HDDS-5062
   
   ## How was this patch tested?
   Added unit test
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to