[ 
https://issues.apache.org/jira/browse/RATIS-1796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17697219#comment-17697219
 ] 

Kaijie Chen commented on RATIS-1796:
------------------------------------

{quote}I have confirmed it's caused by heartbeat (see new logs in description) 
and updated the title of this Jira.
You can see more logs in artifacts of this CI: 
[https://github.com/kaijchen/ratis/actions/runs/4349993784]
{quote}
Searching for {{isHeartbeat}} in the logs, there are a few cases which are not 
caused by heartbeat (2/13).

I'll look into it later. 
{code:java}
$ grep isHeartbeat unit-*/iteration*/ratis-test/*-output.txt
unit-1/iteration3/ratis-test/org.apache.ratis.grpc.cli.TestElectionCommandIntegrationWithGrpc-output.txt:2023-03-07
 02:28:54,806 [s1-server-thread4] INFO  impl.RoleInfo 
(RoleInfo.java:startFollowerState(104)) - s1: startFollowerState 
reason:appendEntries: s0, term: 1, isHeartbeat: true, trace:
unit-1/iteration9/ratis-test/org.apache.ratis.grpc.cli.TestElectionCommandIntegrationWithGrpc-output.txt:2023-03-07
 02:30:57,180 [s0-server-thread3] INFO  impl.RoleInfo 
(RoleInfo.java:startFollowerState(104)) - s0: startFollowerState 
reason:appendEntries: s1, term: 1, isHeartbeat: true, trace:
unit-3/iteration4/ratis-test/org.apache.ratis.grpc.cli.TestElectionCommandIntegrationWithGrpc-output.txt:2023-03-07
 02:29:15,824 [s1-server-thread3] INFO  impl.RoleInfo 
(RoleInfo.java:startFollowerState(104)) - s1: startFollowerState 
reason:appendEntries: s0, term: 2, isHeartbeat: true, trace:
unit-6/iteration2/ratis-test/org.apache.ratis.grpc.cli.TestElectionCommandIntegrationWithGrpc-output.txt:2023-03-07
 02:28:41,616 [s1-server-thread1] INFO  impl.RoleInfo 
(RoleInfo.java:startFollowerState(104)) - s1: startFollowerState 
reason:appendEntries: s0, term: 1, isHeartbeat: false, trace:
unit-6/iteration2/ratis-test/org.apache.ratis.grpc.cli.TestElectionCommandIntegrationWithGrpc-output.txt:2023-03-07
 02:28:41,634 [s2-server-thread1] INFO  impl.RoleInfo 
(RoleInfo.java:startFollowerState(104)) - s2: startFollowerState 
reason:appendEntries: s0, term: 1, isHeartbeat: true, trace:
unit-6/iteration2/ratis-test/org.apache.ratis.grpc.cli.TestElectionCommandIntegrationWithGrpc-output.txt:2023-03-07
 02:28:42,402 [s1-server-thread3] INFO  impl.RoleInfo 
(RoleInfo.java:startFollowerState(104)) - s1: startFollowerState 
reason:appendEntries: s0, term: 1, isHeartbeat: true, trace:
unit-6/iteration3/ratis-test/org.apache.ratis.grpc.cli.TestElectionCommandIntegrationWithGrpc-output.txt:2023-03-07
 02:29:07,262 [s1-server-thread3] INFO  impl.RoleInfo 
(RoleInfo.java:startFollowerState(104)) - s1: startFollowerState 
reason:appendEntries: s0, term: 1, isHeartbeat: true, trace:
unit-6/iteration6/ratis-test/org.apache.ratis.grpc.cli.TestElectionCommandIntegrationWithGrpc-output.txt:2023-03-07
 02:30:23,908 [s0-server-thread1] INFO  impl.RoleInfo 
(RoleInfo.java:startFollowerState(104)) - s0: startFollowerState 
reason:appendEntries: s1, term: 1, isHeartbeat: true, trace:
unit-6/iteration6/ratis-test/org.apache.ratis.grpc.cli.TestElectionCommandIntegrationWithGrpc-output.txt:2023-03-07
 02:30:25,947 [s0-server-thread3] INFO  impl.RoleInfo 
(RoleInfo.java:startFollowerState(104)) - s0: startFollowerState 
reason:appendEntries: s1, term: 1, isHeartbeat: true, trace:
unit-6/iteration7/ratis-test/org.apache.ratis.grpc.cli.TestElectionCommandIntegrationWithGrpc-output.txt:2023-03-07
 02:30:55,072 [s0-server-thread1] INFO  impl.RoleInfo 
(RoleInfo.java:startFollowerState(104)) - s0: startFollowerState 
reason:appendEntries: s1, term: 1, isHeartbeat: true, trace:
unit-7/iteration10/ratis-test/org.apache.ratis.grpc.cli.TestElectionCommandIntegrationWithGrpc-output.txt:2023-03-07
 02:32:05,381 [s0-server-thread1] INFO  impl.RoleInfo 
(RoleInfo.java:startFollowerState(104)) - s0: startFollowerState 
reason:appendEntries: s1, term: 1, isHeartbeat: false, trace:
unit-7/iteration10/ratis-test/org.apache.ratis.grpc.cli.TestElectionCommandIntegrationWithGrpc-output.txt:2023-03-07
 02:32:08,909 [s0-server-thread2] INFO  impl.RoleInfo 
(RoleInfo.java:startFollowerState(104)) - s0: startFollowerState 
reason:appendEntries: s1, term: 1, isHeartbeat: true, trace:
unit-9/iteration7/ratis-test/org.apache.ratis.grpc.cli.TestElectionCommandIntegrationWithGrpc-output.txt:2023-03-07
 02:30:15,609 [s0-server-thread2] INFO  impl.RoleInfo 
(RoleInfo.java:startFollowerState(104)) - s0: startFollowerState 
reason:appendEntries: s1, term: 1, isHeartbeat: true, trace:
{code}

> Fix TransferLeadership stopped by heartbeat from old leader
> -----------------------------------------------------------
>
>                 Key: RATIS-1796
>                 URL: https://issues.apache.org/jira/browse/RATIS-1796
>             Project: Ratis
>          Issue Type: Sub-task
>            Reporter: Kaijie Chen
>            Assignee: Kaijie Chen
>            Priority: Major
>          Time Spent: 50m
>  Remaining Estimate: 0h
>
> Candidate state of transferee may be stopped by the appendEntries from old 
> leader, see the log below 
> {code:java}
> 2023-02-28 04:52:45,026 [s0-server-thread1] INFO  impl.TransferLeadership 
> (TransferLeadership.java:tryTransferLeadership(107)) - s0@group-43918D205BB2: 
> start transferring leadership to s1
> 2023-02-28 04:52:45,029 [s0-server-thread1] INFO  impl.TransferLeadership 
> (TransferLeadership.java:tryTransferLeadership(116)) - s0@group-43918D205BB2: 
> sent StartLeaderElection to transferee s1 immediately as it already has 
> up-to-date log
> 2023-02-28 04:52:45,031 [grpc-default-executor-6] INFO  impl.RoleInfo 
> (RoleInfo.java:shutdownFollowerState(111)) - s1: shutdown 
> s1@group-43918D205BB2-FollowerState
> 2023-02-28 04:52:45,032 [s1@group-43918D205BB2-FollowerState] INFO  
> impl.FollowerState (FollowerState.java:run(152)) - 
> s1@group-43918D205BB2-FollowerState was interrupted
> 2023-02-28 04:52:45,032 [grpc-default-executor-6] INFO  impl.RoleInfo 
> (RoleInfo.java:updateAndGet(140)) - s1: start 
> s1@group-43918D205BB2-LeaderElection4
> 2023-02-28 04:52:45,054 [s1-server-thread1] INFO  impl.RoleInfo 
> (RoleInfo.java:shutdownLeaderElection(131)) - s1: shutdown 
> s1@group-43918D205BB2-LeaderElection4
> 2023-02-28 04:52:45,054 [s1-server-thread1] INFO  impl.RoleInfo 
> (RoleInfo.java:startFollowerState(104)) - s1: startFollowerState 
> reason:appendEntries from s0 term 1,
>   trace: java.base/java.lang.Thread.getStackTrace(Thread.java:1602),
>     
> org.apache.ratis.server.impl.RoleInfo.startFollowerState(RoleInfo.java:104),
>     
> org.apache.ratis.server.impl.RaftServerImpl.changeToFollower(RaftServerImpl.java:547),
>     
> org.apache.ratis.server.impl.RaftServerImpl.changeToFollowerAndPersistMetadata(RaftServerImpl.java:556),
>     
> org.apache.ratis.server.impl.RaftServerImpl.appendEntriesAsync(RaftServerImpl.java:1498),
>     
> org.apache.ratis.server.impl.RaftServerImpl.appendEntriesAsync(RaftServerImpl.java:1396),
>     
> org.apache.ratis.server.impl.RaftServerProxy.lambda$appendEntriesAsync$26(RaftServerProxy.java:639),
>     org.apache.ratis.util.JavaUtils.callAsUnchecked(JavaUtils.java:117),
>     
> org.apache.ratis.server.impl.RaftServerImpl.lambda$executeSubmitServerRequestAsync$11(RaftServerImpl.java:818),
>     
> java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700),
>     
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128),
>     
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628),
>     java.base/java.lang.Thread.run(Thread.java:829)
> 2023-02-28 04:52:45,055 [s1-server-thread1] INFO  impl.RoleInfo 
> (RoleInfo.java:updateAndGet(140)) - s1: start 
> s1@group-43918D205BB2-FollowerState
> {code}
> Update: print {{isHeartbeat}} in log
> {code:java}
> 2023-03-07 02:28:54,806 [s1-server-thread4] INFO  impl.RoleInfo 
> (RoleInfo.java:startFollowerState(104)) - s1: startFollowerState 
> reason:appendEntries: s0, term: 1, isHeartbeat: true,
>   trace:
>     java.base/java.lang.Thread.getStackTrace(Thread.java:1602),
>     
> org.apache.ratis.server.impl.RoleInfo.startFollowerState(RoleInfo.java:104),
>     
> org.apache.ratis.server.impl.RaftServerImpl.changeToFollower(RaftServerImpl.java:547),
>     
> org.apache.ratis.server.impl.RaftServerImpl.changeToFollowerAndPersistMetadata(RaftServerImpl.java:556),
>     
> org.apache.ratis.server.impl.RaftServerImpl.appendEntriesAsync(RaftServerImpl.java:1498),
>     
> org.apache.ratis.server.impl.RaftServerImpl.appendEntriesAsync(RaftServerImpl.java:1396),
>     
> org.apache.ratis.server.impl.RaftServerProxy.lambda$appendEntriesAsync$26(RaftServerProxy.java:639),
>     org.apache.ratis.util.JavaUtils.callAsUnchecked(JavaUtils.java:117),
>     
> org.apache.ratis.server.impl.RaftServerImpl.lambda$executeSubmitServerRequestAsync$11(RaftServerImpl.java:818),
>     
> java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700),
>     
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128),
>     
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628),
>     java.base/java.lang.Thread.run(Thread.java:829)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to