[
https://issues.apache.org/jira/browse/RATIS-1796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17697219#comment-17697219
]
Kaijie Chen commented on RATIS-1796:
------------------------------------
{quote}I have confirmed it's caused by heartbeat (see new logs in description)
and updated the title of this Jira.
You can see more logs in artifacts of this CI:
[https://github.com/kaijchen/ratis/actions/runs/4349993784]
{quote}
Searching for {{isHeartbeat}} in the logs, there are a few cases which are not
caused by heartbeat (2/13).
I'll look into it later.
{code:java}
$ grep isHeartbeat unit-*/iteration*/ratis-test/*-output.txt
unit-1/iteration3/ratis-test/org.apache.ratis.grpc.cli.TestElectionCommandIntegrationWithGrpc-output.txt:2023-03-07
02:28:54,806 [s1-server-thread4] INFO impl.RoleInfo
(RoleInfo.java:startFollowerState(104)) - s1: startFollowerState
reason:appendEntries: s0, term: 1, isHeartbeat: true, trace:
unit-1/iteration9/ratis-test/org.apache.ratis.grpc.cli.TestElectionCommandIntegrationWithGrpc-output.txt:2023-03-07
02:30:57,180 [s0-server-thread3] INFO impl.RoleInfo
(RoleInfo.java:startFollowerState(104)) - s0: startFollowerState
reason:appendEntries: s1, term: 1, isHeartbeat: true, trace:
unit-3/iteration4/ratis-test/org.apache.ratis.grpc.cli.TestElectionCommandIntegrationWithGrpc-output.txt:2023-03-07
02:29:15,824 [s1-server-thread3] INFO impl.RoleInfo
(RoleInfo.java:startFollowerState(104)) - s1: startFollowerState
reason:appendEntries: s0, term: 2, isHeartbeat: true, trace:
unit-6/iteration2/ratis-test/org.apache.ratis.grpc.cli.TestElectionCommandIntegrationWithGrpc-output.txt:2023-03-07
02:28:41,616 [s1-server-thread1] INFO impl.RoleInfo
(RoleInfo.java:startFollowerState(104)) - s1: startFollowerState
reason:appendEntries: s0, term: 1, isHeartbeat: false, trace:
unit-6/iteration2/ratis-test/org.apache.ratis.grpc.cli.TestElectionCommandIntegrationWithGrpc-output.txt:2023-03-07
02:28:41,634 [s2-server-thread1] INFO impl.RoleInfo
(RoleInfo.java:startFollowerState(104)) - s2: startFollowerState
reason:appendEntries: s0, term: 1, isHeartbeat: true, trace:
unit-6/iteration2/ratis-test/org.apache.ratis.grpc.cli.TestElectionCommandIntegrationWithGrpc-output.txt:2023-03-07
02:28:42,402 [s1-server-thread3] INFO impl.RoleInfo
(RoleInfo.java:startFollowerState(104)) - s1: startFollowerState
reason:appendEntries: s0, term: 1, isHeartbeat: true, trace:
unit-6/iteration3/ratis-test/org.apache.ratis.grpc.cli.TestElectionCommandIntegrationWithGrpc-output.txt:2023-03-07
02:29:07,262 [s1-server-thread3] INFO impl.RoleInfo
(RoleInfo.java:startFollowerState(104)) - s1: startFollowerState
reason:appendEntries: s0, term: 1, isHeartbeat: true, trace:
unit-6/iteration6/ratis-test/org.apache.ratis.grpc.cli.TestElectionCommandIntegrationWithGrpc-output.txt:2023-03-07
02:30:23,908 [s0-server-thread1] INFO impl.RoleInfo
(RoleInfo.java:startFollowerState(104)) - s0: startFollowerState
reason:appendEntries: s1, term: 1, isHeartbeat: true, trace:
unit-6/iteration6/ratis-test/org.apache.ratis.grpc.cli.TestElectionCommandIntegrationWithGrpc-output.txt:2023-03-07
02:30:25,947 [s0-server-thread3] INFO impl.RoleInfo
(RoleInfo.java:startFollowerState(104)) - s0: startFollowerState
reason:appendEntries: s1, term: 1, isHeartbeat: true, trace:
unit-6/iteration7/ratis-test/org.apache.ratis.grpc.cli.TestElectionCommandIntegrationWithGrpc-output.txt:2023-03-07
02:30:55,072 [s0-server-thread1] INFO impl.RoleInfo
(RoleInfo.java:startFollowerState(104)) - s0: startFollowerState
reason:appendEntries: s1, term: 1, isHeartbeat: true, trace:
unit-7/iteration10/ratis-test/org.apache.ratis.grpc.cli.TestElectionCommandIntegrationWithGrpc-output.txt:2023-03-07
02:32:05,381 [s0-server-thread1] INFO impl.RoleInfo
(RoleInfo.java:startFollowerState(104)) - s0: startFollowerState
reason:appendEntries: s1, term: 1, isHeartbeat: false, trace:
unit-7/iteration10/ratis-test/org.apache.ratis.grpc.cli.TestElectionCommandIntegrationWithGrpc-output.txt:2023-03-07
02:32:08,909 [s0-server-thread2] INFO impl.RoleInfo
(RoleInfo.java:startFollowerState(104)) - s0: startFollowerState
reason:appendEntries: s1, term: 1, isHeartbeat: true, trace:
unit-9/iteration7/ratis-test/org.apache.ratis.grpc.cli.TestElectionCommandIntegrationWithGrpc-output.txt:2023-03-07
02:30:15,609 [s0-server-thread2] INFO impl.RoleInfo
(RoleInfo.java:startFollowerState(104)) - s0: startFollowerState
reason:appendEntries: s1, term: 1, isHeartbeat: true, trace:
{code}
> Fix TransferLeadership stopped by heartbeat from old leader
> -----------------------------------------------------------
>
> Key: RATIS-1796
> URL: https://issues.apache.org/jira/browse/RATIS-1796
> Project: Ratis
> Issue Type: Sub-task
> Reporter: Kaijie Chen
> Assignee: Kaijie Chen
> Priority: Major
> Time Spent: 50m
> Remaining Estimate: 0h
>
> Candidate state of transferee may be stopped by the appendEntries from old
> leader, see the log below
> {code:java}
> 2023-02-28 04:52:45,026 [s0-server-thread1] INFO impl.TransferLeadership
> (TransferLeadership.java:tryTransferLeadership(107)) - s0@group-43918D205BB2:
> start transferring leadership to s1
> 2023-02-28 04:52:45,029 [s0-server-thread1] INFO impl.TransferLeadership
> (TransferLeadership.java:tryTransferLeadership(116)) - s0@group-43918D205BB2:
> sent StartLeaderElection to transferee s1 immediately as it already has
> up-to-date log
> 2023-02-28 04:52:45,031 [grpc-default-executor-6] INFO impl.RoleInfo
> (RoleInfo.java:shutdownFollowerState(111)) - s1: shutdown
> s1@group-43918D205BB2-FollowerState
> 2023-02-28 04:52:45,032 [s1@group-43918D205BB2-FollowerState] INFO
> impl.FollowerState (FollowerState.java:run(152)) -
> s1@group-43918D205BB2-FollowerState was interrupted
> 2023-02-28 04:52:45,032 [grpc-default-executor-6] INFO impl.RoleInfo
> (RoleInfo.java:updateAndGet(140)) - s1: start
> s1@group-43918D205BB2-LeaderElection4
> 2023-02-28 04:52:45,054 [s1-server-thread1] INFO impl.RoleInfo
> (RoleInfo.java:shutdownLeaderElection(131)) - s1: shutdown
> s1@group-43918D205BB2-LeaderElection4
> 2023-02-28 04:52:45,054 [s1-server-thread1] INFO impl.RoleInfo
> (RoleInfo.java:startFollowerState(104)) - s1: startFollowerState
> reason:appendEntries from s0 term 1,
> trace: java.base/java.lang.Thread.getStackTrace(Thread.java:1602),
>
> org.apache.ratis.server.impl.RoleInfo.startFollowerState(RoleInfo.java:104),
>
> org.apache.ratis.server.impl.RaftServerImpl.changeToFollower(RaftServerImpl.java:547),
>
> org.apache.ratis.server.impl.RaftServerImpl.changeToFollowerAndPersistMetadata(RaftServerImpl.java:556),
>
> org.apache.ratis.server.impl.RaftServerImpl.appendEntriesAsync(RaftServerImpl.java:1498),
>
> org.apache.ratis.server.impl.RaftServerImpl.appendEntriesAsync(RaftServerImpl.java:1396),
>
> org.apache.ratis.server.impl.RaftServerProxy.lambda$appendEntriesAsync$26(RaftServerProxy.java:639),
> org.apache.ratis.util.JavaUtils.callAsUnchecked(JavaUtils.java:117),
>
> org.apache.ratis.server.impl.RaftServerImpl.lambda$executeSubmitServerRequestAsync$11(RaftServerImpl.java:818),
>
> java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700),
>
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128),
>
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628),
> java.base/java.lang.Thread.run(Thread.java:829)
> 2023-02-28 04:52:45,055 [s1-server-thread1] INFO impl.RoleInfo
> (RoleInfo.java:updateAndGet(140)) - s1: start
> s1@group-43918D205BB2-FollowerState
> {code}
> Update: print {{isHeartbeat}} in log
> {code:java}
> 2023-03-07 02:28:54,806 [s1-server-thread4] INFO impl.RoleInfo
> (RoleInfo.java:startFollowerState(104)) - s1: startFollowerState
> reason:appendEntries: s0, term: 1, isHeartbeat: true,
> trace:
> java.base/java.lang.Thread.getStackTrace(Thread.java:1602),
>
> org.apache.ratis.server.impl.RoleInfo.startFollowerState(RoleInfo.java:104),
>
> org.apache.ratis.server.impl.RaftServerImpl.changeToFollower(RaftServerImpl.java:547),
>
> org.apache.ratis.server.impl.RaftServerImpl.changeToFollowerAndPersistMetadata(RaftServerImpl.java:556),
>
> org.apache.ratis.server.impl.RaftServerImpl.appendEntriesAsync(RaftServerImpl.java:1498),
>
> org.apache.ratis.server.impl.RaftServerImpl.appendEntriesAsync(RaftServerImpl.java:1396),
>
> org.apache.ratis.server.impl.RaftServerProxy.lambda$appendEntriesAsync$26(RaftServerProxy.java:639),
> org.apache.ratis.util.JavaUtils.callAsUnchecked(JavaUtils.java:117),
>
> org.apache.ratis.server.impl.RaftServerImpl.lambda$executeSubmitServerRequestAsync$11(RaftServerImpl.java:818),
>
> java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700),
>
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128),
>
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628),
> java.base/java.lang.Thread.run(Thread.java:829)
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)