[
https://issues.apache.org/jira/browse/RATIS-1770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17699521#comment-17699521
]
Tsz-wo Sze commented on RATIS-1770:
-----------------------------------
When reviewing [https://github.com/apache/ratis/pull/845] , found that the
current code has the following issues. Please see if they should be fixed in
the same pull request or separately; see [^845_review.patch]
- It is a very good idea to always have a TransferLeadershipRequest to trigger
a transfer leadership since, when there is another trigger, it can see the
previous request and be handled correspondingly. Currently, there are three
code path to trigger a transfer leadership. We should fix all of them
## a TransferLeadershipRequest from a client // already fixed
## checkPeersForYieldingLeader() // fixed by the pull request
## onFollowerAppendEntriesReply(..) // not yet fixed
- In the three cases above, leaderLastEntry and the follower LogAppender may
or may not be available from the beginning depending on the case. When they are
available, they should be passed so that they can be reused. Especially for
leaderLastEntry, it will be detected if it is changed in
sendStartLeaderElection(..).
We can introduce a Context class to deal with it.
- Some of the current methods such as tryTransferLeadership(..),
sendStartLeaderElection(..), etc. may fail. They should return an error message
in such cases. Then, it could fail the pending request immediately. Currently,
it may not fail the pending request and wait until timed out.
- In checkPeersForYieldingLeader(), there could be multiple followers having
the same highest priority. It only checks one of them currently. Instead, it
should check each follower.
- We should use serverExecutor to call startLeaderElection(r).
- Move the code from finishTransferLeadership() to onGroupLeaderElected() and
then remove finishTransferLeadership().
- TransferLeadership should use RaftServerConfigKeys.Rpc.requestTimeout
instead of server.getRandomElectionTimeout().
> Yield leader to higher priority peer by TransferLeadership
> ----------------------------------------------------------
>
> Key: RATIS-1770
> URL: https://issues.apache.org/jira/browse/RATIS-1770
> Project: Ratis
> Issue Type: Sub-task
> Reporter: Kaijie Chen
> Assignee: Kaijie Chen
> Priority: Minor
> Attachments: 845_review.patch
>
> Time Spent: 1h 10m
> Remaining Estimate: 0h
>
> Followup RATIS-1762.
> There might be race conditions between priority-based YieldLeadership and
> user-requested TransferLeadership. For example:
> ||Node||Role||Priority||
> |Peer 1|Leader|0|
> |Peer 2|Follower|1|
> |Peer 3|Follower|1|
> If user requested TransferLeadership to peer 3, while the YieldLeadership
> found peer 2 has higher priority than current leader.
> Peer 1 will send StartLeaderElection to both peer 2 and peer 3, and there
> might be a race condition (although it's benign).
> One immediate thought is to use the new TransferLeadership to yield
> leadership to higher priority peer.
> But it may cause following problems as quoted:
> {quote}If the higher priority peer lags behind a lot, it may take some time
> to catch up the latest transaction. If the prior leader reject client
> requests, then the service may be unavailable for a long time.
> {quote}
> To solve this problem, the old leader should only start TransferLeadership
> *iff* the higher priority peer is up-to-date.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)