[ 
https://issues.apache.org/jira/browse/RATIS-1770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17699521#comment-17699521
 ] 

Tsz-wo Sze commented on RATIS-1770:
-----------------------------------

When reviewing [https://github.com/apache/ratis/pull/845] , found that the 
current code has the following issues. Please see if they should be fixed in 
the same pull request or separately; see [^845_review.patch]
 - It is a very good idea to always have a TransferLeadershipRequest to trigger 
a transfer leadership since, when there is another trigger, it can see the 
previous request and be handled correspondingly. Currently, there are three 
code path to trigger a transfer leadership.  We should fix all of them
 ## a TransferLeadershipRequest from a client // already fixed
 ## checkPeersForYieldingLeader() // fixed by the pull request
 ## onFollowerAppendEntriesReply(..) // not yet fixed

 - In the three cases above, leaderLastEntry and the follower LogAppender may 
or may not be available from the beginning depending on the case. When they are 
available, they should be passed so that they can be reused. Especially for 
leaderLastEntry, it will be detected if it is changed in 
sendStartLeaderElection(..).
We can introduce a Context class to deal with it.

 - Some of the current methods such as tryTransferLeadership(..), 
sendStartLeaderElection(..), etc. may fail. They should return an error message 
in such cases. Then, it could fail the pending request immediately. Currently, 
it may not fail the pending request and wait until timed out.

 - In checkPeersForYieldingLeader(), there could be multiple followers having 
the same highest priority. It only checks one of them currently. Instead, it 
should check each follower.

 - We should use serverExecutor to call startLeaderElection(r).

 - Move the code from finishTransferLeadership() to onGroupLeaderElected() and 
then remove finishTransferLeadership().

 - TransferLeadership should use RaftServerConfigKeys.Rpc.requestTimeout 
instead of server.getRandomElectionTimeout().

> Yield leader to higher priority peer by TransferLeadership
> ----------------------------------------------------------
>
>                 Key: RATIS-1770
>                 URL: https://issues.apache.org/jira/browse/RATIS-1770
>             Project: Ratis
>          Issue Type: Sub-task
>            Reporter: Kaijie Chen
>            Assignee: Kaijie Chen
>            Priority: Minor
>         Attachments: 845_review.patch
>
>          Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Followup RATIS-1762.
> There might be race conditions between priority-based YieldLeadership and 
> user-requested TransferLeadership. For example:
> ||Node||Role||Priority||
> |Peer 1|Leader|0|
> |Peer 2|Follower|1|
> |Peer 3|Follower|1|
> If user requested TransferLeadership to peer 3, while the YieldLeadership 
> found peer 2 has higher priority than current leader.
> Peer 1 will send StartLeaderElection to both peer 2 and peer 3, and there 
> might be a race condition (although it's benign).
> One immediate thought is to use the new TransferLeadership to yield 
> leadership to higher priority peer.
> But it may cause following problems as quoted:
> {quote}If the higher priority peer lags behind a lot, it may take some time 
> to catch up the latest transaction. If the prior leader reject client 
> requests, then the service may be unavailable for a long time.
> {quote}
> To solve this problem, the old leader should only start TransferLeadership 
> *iff* the higher priority peer is up-to-date.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to