[
https://issues.apache.org/jira/browse/RATIS-1770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17699750#comment-17699750
]
Kaijie Chen edited comment on RATIS-1770 at 3/13/23 6:02 PM:
-------------------------------------------------------------
(EDITED)
[~szetszwo] After going through the code, I think it is *not* OK to call
{{TransferLeadership#start()}} in {{{}onFollowerAppendEntriesReply(){}}}.
Because there will always be a pending request when
{{onFollowerAppendEntriesReply()}} is called. But we may call
{{tryTransferLeadership()}} instead.
The problem happens in the error handling {*}when transferee is not
up-to-date{*}:
# On client-requested transfer leadership, it should keep waiting.
# On priority-based yield leadership, it can stop immediately.
Currently, it always takes the 2nd approach, causing this error in
{{{}testTransferLeader{}}}:
{code:java}
org.apache.ratis.protocol.exceptions.TransferLeadershipException:
s2@group-DFEA0692C62E: Failed to transfer leadership to s0 (the current leader
is s2): Follower not up-to-date: followerMatchIndex = 1 <
leaderLastEntry.getIndex() = 2
at
org.apache.ratis.server.impl.TransferLeadership$PendingRequest.complete(TransferLeadership.java:102)
at
org.apache.ratis.server.impl.TransferLeadership.start(TransferLeadership.java:186)
at
org.apache.ratis.server.impl.TransferLeadership.start(TransferLeadership.java:173)
{code}
See this branch for the CI and fix:
https://github.com/kaijchen/ratis/commits/RATIS-1770-2
was (Author: ckj996):
(EDITED)
[~szetszwo] After going through the code, I think it is *not* OK to call
{{TransferLeadership#start()}} in {{{}onFollowerAppendEntriesReply(){}}}.
Because there will always be a pending request when
{{onFollowerAppendEntriesReply()}} is called. But we may call
{{tryTransferLeadership()}} instead.
The problem happens in the error handling {*}when transferee is not
up-to-date{*}:
# On client-requested transfer leadership, it should keep waiting.
# On priority-based yield leadership, it can stop immediately.
Currently, it always takes the 2nd approach, causing this error in
{{{}testTransferLeader{}}}:
{code:java}
org.apache.ratis.protocol.exceptions.TransferLeadershipException:
s2@group-DFEA0692C62E: Failed to transfer leadership to s0 (the current leader
is s2): Follower not up-to-date: followerMatchIndex = 1 <
leaderLastEntry.getIndex() = 2
at
org.apache.ratis.server.impl.TransferLeadership$PendingRequest.complete(TransferLeadership.java:102)
at
org.apache.ratis.server.impl.TransferLeadership.start(TransferLeadership.java:186)
at
org.apache.ratis.server.impl.TransferLeadership.start(TransferLeadership.java:173)
{code}
> Yield leader to higher priority peer by TransferLeadership
> ----------------------------------------------------------
>
> Key: RATIS-1770
> URL: https://issues.apache.org/jira/browse/RATIS-1770
> Project: Ratis
> Issue Type: Sub-task
> Reporter: Kaijie Chen
> Assignee: Kaijie Chen
> Priority: Minor
> Attachments: 845_review.patch
>
> Time Spent: 1h 10m
> Remaining Estimate: 0h
>
> Followup RATIS-1762.
> There might be race conditions between priority-based YieldLeadership and
> user-requested TransferLeadership. For example:
> ||Node||Role||Priority||
> |Peer 1|Leader|0|
> |Peer 2|Follower|1|
> |Peer 3|Follower|1|
> If user requested TransferLeadership to peer 3, while the YieldLeadership
> found peer 2 has higher priority than current leader.
> Peer 1 will send StartLeaderElection to both peer 2 and peer 3, and there
> might be a race condition (although it's benign).
> One immediate thought is to use the new TransferLeadership to yield
> leadership to higher priority peer.
> But it may cause following problems as quoted:
> {quote}If the higher priority peer lags behind a lot, it may take some time
> to catch up the latest transaction. If the prior leader reject client
> requests, then the service may be unavailable for a long time.
> {quote}
> To solve this problem, the old leader should only start TransferLeadership
> *iff* the higher priority peer is up-to-date.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)