[jira] [Commented] (RATIS-1770) Yield leader to higher priority peer by TransferLeadership

Kaijie Chen (Jira) Mon, 13 Mar 2023 05:32:09 -0700


    [ 
https://issues.apache.org/jira/browse/RATIS-1770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17699608#comment-17699608
 ]


Kaijie Chen commented on RATIS-1770:
------------------------------------

Thanks [~szetszwo] for reviewing and proposing the patch. I have some different 
opinions, the rest looks good.
{quote} - It is a very good idea to always have a TransferLeadershipRequest to 
trigger a transfer leadership since, when there is another trigger, it can see 
the previous request and be handled correspondingly. Currently, there are three 
code path to trigger a transfer leadership. We should fix all of them
 ## a TransferLeadershipRequest from a client // already fixed
 ## checkPeersForYieldingLeader() // fixed by the pull request
 ## onFollowerAppendEntriesReply(..) // not yet fixed{quote}
I was not intending to "fix" case 3, because it's a low level operation (part 
of a TransferLeadership).

When {{tryTransferLeadership()}} fails due to transferee not up-to-date, the 
leader notifies {{LogAppender}} to update the transferee, and  
{{onFollowerAppendEntriesReply()}} is then called to retry 
{{{}sendStartLeaderElection(){}}}.
{quote} - Some of the current methods such as tryTransferLeadership(..), 
sendStartLeaderElection(..), etc. may fail. They should return an error message 
in such cases. Then, it could fail the pending request immediately. Currently, 
it may not fail the pending request and wait until timed out.{quote}
TransferLeadership should not fail immediately if the transferee is not 
up-to-date (shown in the case above).
{quote} - TransferLeadership should use RaftServerConfigKeys.Rpc.requestTimeout 
instead of server.getRandomElectionTimeout().{quote}
A leader election should not block the raft group very long by default, 
normally within a random election timeout.
Maybe we should make ratis-shell to use 
{{RaftServerConfigKeys.Rpc.requestTimeout}} as default.

> Yield leader to higher priority peer by TransferLeadership
> ----------------------------------------------------------
>
>                 Key: RATIS-1770
>                 URL: https://issues.apache.org/jira/browse/RATIS-1770
>             Project: Ratis
>          Issue Type: Sub-task
>            Reporter: Kaijie Chen
>            Assignee: Kaijie Chen
>            Priority: Minor
>         Attachments: 845_review.patch
>
>          Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Followup RATIS-1762.
> There might be race conditions between priority-based YieldLeadership and 
> user-requested TransferLeadership. For example:
> ||Node||Role||Priority||
> |Peer 1|Leader|0|
> |Peer 2|Follower|1|
> |Peer 3|Follower|1|
> If user requested TransferLeadership to peer 3, while the YieldLeadership 
> found peer 2 has higher priority than current leader.
> Peer 1 will send StartLeaderElection to both peer 2 and peer 3, and there 
> might be a race condition (although it's benign).
> One immediate thought is to use the new TransferLeadership to yield 
> leadership to higher priority peer.
> But it may cause following problems as quoted:
> {quote}If the higher priority peer lags behind a lot, it may take some time 
> to catch up the latest transaction. If the prior leader reject client 
> requests, then the service may be unavailable for a long time.
> {quote}
> To solve this problem, the old leader should only start TransferLeadership 
> *iff* the higher priority peer is up-to-date.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (RATIS-1770) Yield leader to higher priority peer by TransferLeadership

Reply via email to