[
https://issues.apache.org/jira/browse/RATIS-2300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Xinhao Gu updated RATIS-2300:
-----------------------------
Description:
I've encountered an issue where, after a leader election, the newly elected
leader consistently fails to communicate with the previous (old) leader.
Specifically, RPCs (such as heartbeats or appendEntries requests) sent from the
new leader to the old leader always time out.
!image-2025-05-20-12-06-57-540.png!
{code:java}
2025-05-18 01:09:14,974 [timer0] WARN o.a.r.g.s.GrpcLogAppender:441 -
1@group-000000000000->0-GrpcLogAppender: Timed out HEARTBEAT appendEntries,
errorCount=1, request=AppendEntriesRequest:cid=0,entriesCount=0 2025-05-18
01:09:14,975 [timer1] WARN o.a.r.g.s.GrpcLogAppender:441 -
1@group-000000000000->0-GrpcLogAppender: Timed out HEARTBEAT appendEntries,
errorCount=2, request=AppendEntriesRequest:cid=1,entriesCount=0 2025-05-18
01:09:14,976 [timer2] WARN o.a.r.g.s.GrpcLogAppender:441 -
1@group-000000000000->0-GrpcLogAppender: Timed out appendEntries, errorCount=3,
request=AppendEntriesRequest:cid=2,entriesCount=4,entries=(t:2732,
i:2069702)...(t:2732, i:2069705) 2025-05-18 01:09:15,271 [timer4] WARN
o.a.r.g.s.GrpcLogAppender:441 - 1@group-000000000000->0-GrpcLogAppender: Timed
out HEARTBEAT appendEntries, errorCount=4,
request=AppendEntriesRequest:cid=3,entriesCount=0 2025-05-18 01:09:15,773
[timer5] WARN o.a.r.g.s.GrpcLogAppender:441 -
1@group-000000000000->0-GrpcLogAppender: Timed out HEARTBEAT appendEntries,
errorCount=5, request=AppendEntriesRequest:cid=4,entriesCount=0 2025-05-18
01:09:16,274 [timer7] WARN o.a.r.g.s.GrpcLogAppender:441 -
1@group-000000000000->0-GrpcLogAppender: Timed out HEARTBEAT appendEntries,
errorCount=6, request=AppendEntriesRequest:cid=5,entriesCount=0 2025-05-18
01:09:16,775 [timer0] WARN o.a.r.g.s.GrpcLogAppender:441 -
1@group-000000000000->0-GrpcLogAppender: Timed out HEARTBEAT appendEntries,
errorCount=7, request=AppendEntriesRequest:cid=6,entriesCount=0 2025-05-18
01:09:17,274 [timer2] WARN o.a.r.g.s.GrpcLogAppender:441 -
1@group-000000000000->0-GrpcLogAppender: Timed out HEARTBEAT appendEntries,
errorCount=8, request=AppendEntriesRequest:cid=7,entriesCount=0 {code}
was:
I've encountered an issue where, after a leader election, the newly elected
leader consistently fails to communicate with the previous (old) leader.
Specifically, RPCs (such as heartbeats or appendEntries requests) sent from the
new leader to the old leader always time out.
{code:java}
2025-05-18 01:09:14,974 [timer0] WARN o.a.r.g.s.GrpcLogAppender:441 -
1@group-000000000000->0-GrpcLogAppender: Timed out HEARTBEAT appendEntries,
errorCount=1, request=AppendEntriesRequest:cid=0,entriesCount=0 2025-05-18
01:09:14,975 [timer1] WARN o.a.r.g.s.GrpcLogAppender:441 -
1@group-000000000000->0-GrpcLogAppender: Timed out HEARTBEAT appendEntries,
errorCount=2, request=AppendEntriesRequest:cid=1,entriesCount=0 2025-05-18
01:09:14,976 [timer2] WARN o.a.r.g.s.GrpcLogAppender:441 -
1@group-000000000000->0-GrpcLogAppender: Timed out appendEntries, errorCount=3,
request=AppendEntriesRequest:cid=2,entriesCount=4,entries=(t:2732,
i:2069702)...(t:2732, i:2069705) 2025-05-18 01:09:15,271 [timer4] WARN
o.a.r.g.s.GrpcLogAppender:441 - 1@group-000000000000->0-GrpcLogAppender: Timed
out HEARTBEAT appendEntries, errorCount=4,
request=AppendEntriesRequest:cid=3,entriesCount=0 2025-05-18 01:09:15,773
[timer5] WARN o.a.r.g.s.GrpcLogAppender:441 -
1@group-000000000000->0-GrpcLogAppender: Timed out HEARTBEAT appendEntries,
errorCount=5, request=AppendEntriesRequest:cid=4,entriesCount=0 2025-05-18
01:09:16,274 [timer7] WARN o.a.r.g.s.GrpcLogAppender:441 -
1@group-000000000000->0-GrpcLogAppender: Timed out HEARTBEAT appendEntries,
errorCount=6, request=AppendEntriesRequest:cid=5,entriesCount=0 2025-05-18
01:09:16,775 [timer0] WARN o.a.r.g.s.GrpcLogAppender:441 -
1@group-000000000000->0-GrpcLogAppender: Timed out HEARTBEAT appendEntries,
errorCount=7, request=AppendEntriesRequest:cid=6,entriesCount=0 2025-05-18
01:09:17,274 [timer2] WARN o.a.r.g.s.GrpcLogAppender:441 -
1@group-000000000000->0-GrpcLogAppender: Timed out HEARTBEAT appendEntries,
errorCount=8, request=AppendEntriesRequest:cid=7,entriesCount=0 {code}
> New Leader Repeatedly Times Out When Sending RPCs to Old Leader, Causing
> System Hang
> ------------------------------------------------------------------------------------
>
> Key: RATIS-2300
> URL: https://issues.apache.org/jira/browse/RATIS-2300
> Project: Ratis
> Issue Type: Bug
> Reporter: Xinhao Gu
> Assignee: Xinhao Gu
> Priority: Major
> Attachments: image-2025-05-20-12-06-57-540.png
>
>
> I've encountered an issue where, after a leader election, the newly elected
> leader consistently fails to communicate with the previous (old) leader.
> Specifically, RPCs (such as heartbeats or appendEntries requests) sent from
> the new leader to the old leader always time out.
> !image-2025-05-20-12-06-57-540.png!
> {code:java}
> 2025-05-18 01:09:14,974 [timer0] WARN o.a.r.g.s.GrpcLogAppender:441 -
> 1@group-000000000000->0-GrpcLogAppender: Timed out HEARTBEAT appendEntries,
> errorCount=1, request=AppendEntriesRequest:cid=0,entriesCount=0 2025-05-18
> 01:09:14,975 [timer1] WARN o.a.r.g.s.GrpcLogAppender:441 -
> 1@group-000000000000->0-GrpcLogAppender: Timed out HEARTBEAT appendEntries,
> errorCount=2, request=AppendEntriesRequest:cid=1,entriesCount=0 2025-05-18
> 01:09:14,976 [timer2] WARN o.a.r.g.s.GrpcLogAppender:441 -
> 1@group-000000000000->0-GrpcLogAppender: Timed out appendEntries,
> errorCount=3,
> request=AppendEntriesRequest:cid=2,entriesCount=4,entries=(t:2732,
> i:2069702)...(t:2732, i:2069705) 2025-05-18 01:09:15,271 [timer4] WARN
> o.a.r.g.s.GrpcLogAppender:441 - 1@group-000000000000->0-GrpcLogAppender:
> Timed out HEARTBEAT appendEntries, errorCount=4,
> request=AppendEntriesRequest:cid=3,entriesCount=0 2025-05-18 01:09:15,773
> [timer5] WARN o.a.r.g.s.GrpcLogAppender:441 -
> 1@group-000000000000->0-GrpcLogAppender: Timed out HEARTBEAT appendEntries,
> errorCount=5, request=AppendEntriesRequest:cid=4,entriesCount=0 2025-05-18
> 01:09:16,274 [timer7] WARN o.a.r.g.s.GrpcLogAppender:441 -
> 1@group-000000000000->0-GrpcLogAppender: Timed out HEARTBEAT appendEntries,
> errorCount=6, request=AppendEntriesRequest:cid=5,entriesCount=0 2025-05-18
> 01:09:16,775 [timer0] WARN o.a.r.g.s.GrpcLogAppender:441 -
> 1@group-000000000000->0-GrpcLogAppender: Timed out HEARTBEAT appendEntries,
> errorCount=7, request=AppendEntriesRequest:cid=6,entriesCount=0 2025-05-18
> 01:09:17,274 [timer2] WARN o.a.r.g.s.GrpcLogAppender:441 -
> 1@group-000000000000->0-GrpcLogAppender: Timed out HEARTBEAT appendEntries,
> errorCount=8, request=AppendEntriesRequest:cid=7,entriesCount=0 {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)