GlenGeng commented on pull request #1784:
URL: https://github.com/apache/ozone/pull/1784#issuecomment-759909043


   @bshashikant  Thanks for reviewing this patch!
   
   > I also want to know, how are we handling case of a stale leader here? I 
know there is an effort inside ratis to address/detect a case of stale leader., 
but don't we really need anything specific to handle such case here ?
   
   The request inside SCM can be divided into two categories, read/write 
request and read only request.
   
   The read/write request will go through ratis as a RaftClientRequest, so we 
don’t need to care about whether the underlying RaftServer is a still leader or 
has already stepped down. These read/write requests include add/remove/update 
pipeline/container/deleted block.
   
   For the read only request, they will not go through ratis, a stale leader 
SCM might make a dangerous decision if it misses the latest committed writes, 
a.k.a the split-brain issues occurred, two leaders of different terms live 
together.
   
   Nanda pointed out such a case:
   > Our major concern here is the DeleteContainerCommand sent by the stale 
leader. It is possible that the ICR of the recently created container is 
received by the stale leader which has no idea about the new container. The 
stale leader will end up sending DeleteContainerCommand to the datanode in this 
case.
   
   See [SCM HA Handling Stale 
Leader](https://docs.google.com/document/d/1-5-KpR2GYIwWXGRH_C8IUVbFsm8RiETOVNYsMB5W8Ic/edit?usp=sharing)
 drafted by @nandakumar131 for more details.
   
   With help of the lease solution, leader SCM can handle the read only 
request, then call `isLeader`/`getTermOfLeader` in `SCMContext` to confirm its 
leadership, ensuring that when it handles the read only request, it has seen 
the latest commit writes.
   
   > Also, the transitional cases seem to be handled for LEADER_TO_FOLLOWER and 
FOLLOWER_TO_LEADER? How about transition to CANDIDATE state in general?
   
   The candidate state is not expose by Ratis, and SCM does not cares about 
this state either. SCM just treat candidate the same as follower: it should 
stop the work if not leader.
   
   `SCMServiceManager#leaderToFollower` is called in 
`StateMachine#notifyNotLeader` and `SCMServiceManager#followerToLeader` is 
called in  `StateMachine#notifyLeaderChanged` when 
`groupMemberId.getPeerId().equals(newLeaderId)`
   
   We may have a offline talk to discuss more about the subtle cases in how we 
integrate SCM HA with ratis.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to