[ 
https://issues.apache.org/jira/browse/RATIS-1796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17696736#comment-17696736
 ] 

Kaijie Chen commented on RATIS-1796:
------------------------------------

{quote}The old leader should have stepped down first. No?{quote}

The old leader will not step down first, the step down is caused by RequestVote 
from the transferee. Quotes from Raft dissertation: 

{quote}Once the target server receives the TimeoutNow request, it is highly 
likely to start an election before any other server and become leader in the 
next term. Its next message to the prior leader will include its new term 
number, causing the prior leader to step down. At this point, leadership 
transfer is complete.{quote}

In Ratis implementation,  {{stepDownLeaderAsync}} is only called when 
{{newLeader}} is {{null}}.

{code:java}
  CompletableFuture<RaftClientReply> 
transferLeadershipAsync(TransferLeadershipRequest request)
      throws IOException {
    if (request.getNewLeader() == null) {
      return stepDownLeaderAsync(request);
    }
{code}

This is a special case for the TransferLeadership rpc.
Alternatively, it could be interpreted as transfer leadership to any other 
peer, so we can reduce the downtime.

----

For the problem in this Jira,

{quote}I think I have found the problem:

Transferee received startLeaderElection 
(RaftServerImpl#startLeaderElection:1700 -> 
RaftServerImpl#changeToCandidate:649 -> RoleInfo#startLeaderElection:121 -> 
start new thread LeaderElection)
Transferee received appendEntries (stack trace in the log above), and become 
follower.
LeaderElection thread in step 1 is running, found the CandidateState is already 
CLOSED by step 2.
The term of transferee is expected to be increased in step 3 
(LeaderElection#run:238 -> LeaderElection#askForVotes:304 -> 
ServerState#initElection:221 -> currentTerm.incrementAndGet).
But in this case, step 2 is executed before step 3 when the term hasn't been 
increased.{quote}

Maybe we can introduce a Pre-Candidate state along with the Candidate state.
And increase the term when a peer becomes Candidate instead of in 
LeaderElection.

Reference: https://github.com/etcd-io/raft/blob/main/raft.go#L839-L866

[~szetszwo] what do you think?



> TransferLeadership stopped by appendEntries from old leader
> -----------------------------------------------------------
>
>                 Key: RATIS-1796
>                 URL: https://issues.apache.org/jira/browse/RATIS-1796
>             Project: Ratis
>          Issue Type: Sub-task
>            Reporter: Kaijie Chen
>            Assignee: Kaijie Chen
>            Priority: Major
>
> Candidate state of transferee may be stopped by the appendEntries from old 
> leader, see the log below 
> {code:java}
> 2023-02-28 04:52:45,026 [s0-server-thread1] INFO  impl.TransferLeadership 
> (TransferLeadership.java:tryTransferLeadership(107)) - s0@group-43918D205BB2: 
> start transferring leadership to s1
> 2023-02-28 04:52:45,029 [s0-server-thread1] INFO  impl.TransferLeadership 
> (TransferLeadership.java:tryTransferLeadership(116)) - s0@group-43918D205BB2: 
> sent StartLeaderElection to transferee s1 immediately as it already has 
> up-to-date log
> 2023-02-28 04:52:45,031 [grpc-default-executor-6] INFO  impl.RoleInfo 
> (RoleInfo.java:shutdownFollowerState(111)) - s1: shutdown 
> s1@group-43918D205BB2-FollowerState
> 2023-02-28 04:52:45,032 [s1@group-43918D205BB2-FollowerState] INFO  
> impl.FollowerState (FollowerState.java:run(152)) - 
> s1@group-43918D205BB2-FollowerState was interrupted
> 2023-02-28 04:52:45,032 [grpc-default-executor-6] INFO  impl.RoleInfo 
> (RoleInfo.java:updateAndGet(140)) - s1: start 
> s1@group-43918D205BB2-LeaderElection4
> 2023-02-28 04:52:45,054 [s1-server-thread1] INFO  impl.RoleInfo 
> (RoleInfo.java:shutdownLeaderElection(131)) - s1: shutdown 
> s1@group-43918D205BB2-LeaderElection4
> 2023-02-28 04:52:45,054 [s1-server-thread1] INFO  impl.RoleInfo 
> (RoleInfo.java:startFollowerState(104)) - s1: startFollowerState 
> reason:appendEntries from s0 term 1,
>   trace: java.base/java.lang.Thread.getStackTrace(Thread.java:1602),
>     
> org.apache.ratis.server.impl.RoleInfo.startFollowerState(RoleInfo.java:104),
>     
> org.apache.ratis.server.impl.RaftServerImpl.changeToFollower(RaftServerImpl.java:547),
>     
> org.apache.ratis.server.impl.RaftServerImpl.changeToFollowerAndPersistMetadata(RaftServerImpl.java:556),
>     
> org.apache.ratis.server.impl.RaftServerImpl.appendEntriesAsync(RaftServerImpl.java:1498),
>     
> org.apache.ratis.server.impl.RaftServerImpl.appendEntriesAsync(RaftServerImpl.java:1396),
>     
> org.apache.ratis.server.impl.RaftServerProxy.lambda$appendEntriesAsync$26(RaftServerProxy.java:639),
>     org.apache.ratis.util.JavaUtils.callAsUnchecked(JavaUtils.java:117),
>     
> org.apache.ratis.server.impl.RaftServerImpl.lambda$executeSubmitServerRequestAsync$11(RaftServerImpl.java:818),
>     
> java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700),
>     
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128),
>     
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628),
>     java.base/java.lang.Thread.run(Thread.java:829)
> 2023-02-28 04:52:45,055 [s1-server-thread1] INFO  impl.RoleInfo 
> (RoleInfo.java:updateAndGet(140)) - s1: start 
> s1@group-43918D205BB2-FollowerState
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to