GlenGeng commented on pull request #1784: URL: https://github.com/apache/ozone/pull/1784#issuecomment-759909043
@bshashikant Thanks for reviewing this patch! > I also want to know, how are we handling case of a stale leader here? I know there is an effort inside ratis to address/detect a case of stale leader., but don't we really need anything specific to handle such case here ? The request inside SCM can be divided into two categories, read/write request and read only request. The read/write request will go through ratis as a RaftClientRequest, so we don’t need to care about whether the underlying RaftServer is a still leader or has already stepped down. These read/write requests include add/remove/update pipeline/container/deleted block. For the read only request, they will not go through ratis, a stale leader SCM might make a dangerous decision if it misses the latest committed writes, a.k.a the split-brain issues occurred, two leaders of different terms live together. Nanda pointed out such a case: > Our major concern here is the DeleteContainerCommand sent by the stale leader. It is possible that the ICR of the recently created container is received by the stale leader which has no idea about the new container. The stale leader will end up sending DeleteContainerCommand to the datanode in this case. See [SCM HA Handling Stale Leader](https://docs.google.com/document/d/1-5-KpR2GYIwWXGRH_C8IUVbFsm8RiETOVNYsMB5W8Ic/edit?usp=sharing) drafted by @nandakumar131 for more details. With help of the lease solution, leader SCM can handle the read only request, then call `isLeader`/`getTermOfLeader` in `SCMContext` to confirm its leadership, ensuring that when it handles the read only request, it has seen the latest commit writes. > Also, the transitional cases seem to be handled for LEADER_TO_FOLLOWER and FOLLOWER_TO_LEADER? How about transition to CANDIDATE state in general? The candidate state is not expose by Ratis, and SCM does not cares about this state either. SCM just treat candidate the same as follower: it should stop the work if not leader. `SCMServiceManager#leaderToFollower` is called in `StateMachine#notifyNotLeader` and `SCMServiceManager#followerToLeader` is called in `StateMachine#notifyLeaderChanged` when `groupMemberId.getPeerId().equals(newLeaderId)` We may have a offline talk to discuss more about the subtle cases in how we integrate SCM HA with ratis. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
