[
https://issues.apache.org/jira/browse/RATIS-2300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Xinhao Gu updated RATIS-2300:
-----------------------------
Description:
h2. *Problem Description*
I've encountered an issue where, after a leader election, the newly elected
leader consistently fails to communicate with the previous (old) leader.
Specifically, RPCs (such as heartbeats or appendEntries requests) sent from the
new leader to the old leader always time out.
h2. *Scene Description*
*Initial State:* Node {{cn0}} is the established leader of the Raft group.
*Follower Timeout & Election:* At {{{}01:09:02,862{}}}, node {{cn1}} (a
follower) experiences an election timeout, presumably because it did not
receive timely heartbeats from the leader {{{}cn0{}}}.
!image-2025-05-20-12-18-05-012.png|width=1699,height=78!
*Candidacy:* {{cn1}} transitions to the candidate state and starts a new leader
election for a new term.
*New Leader Elected:* Node {{cn2}} votes for {{{}cn1{}}}. Subsequently, {{cn1}}
successfully gathers enough votes and becomes the new leader of the Raft group.
!image-2025-05-20-12-18-50-729.png|width=1895,height=488!{*}Communication
Failure with Old Leader:{*} Following this, the new leader ({{{}cn1{}}}) begins
its attempts to manage the group. However, when {{cn1}} tries to send RPCs
(e.g., heartbeats/AppendEntries) to the _previous_ leader ({{{}cn0{}}}), these
attempts consistently time out.
!image-2025-05-20-12-11-08-832.png|width=1803,height=201!
!image-2025-05-20-12-06-57-540.png|width=2015,height=689!
h3. Follower‘s log
During this period, the follower did not output any logs.
was:
I've encountered an issue where, after a leader election, the newly elected
leader consistently fails to communicate with the previous (old) leader.
Specifically, RPCs (such as heartbeats or appendEntries requests) sent from the
new leader to the old leader always time out.
!image-2025-05-20-12-06-57-540.png|width=2015,height=689!
> New Leader Repeatedly Times Out When Sending RPCs to Old Leader, Causing
> System Hang
> ------------------------------------------------------------------------------------
>
> Key: RATIS-2300
> URL: https://issues.apache.org/jira/browse/RATIS-2300
> Project: Ratis
> Issue Type: Bug
> Reporter: Xinhao Gu
> Assignee: Xinhao Gu
> Priority: Major
> Attachments: image-2025-05-20-12-06-57-540.png,
> image-2025-05-20-12-11-08-832.png, image-2025-05-20-12-18-05-012.png,
> image-2025-05-20-12-18-50-729.png
>
>
> h2. *Problem Description*
> I've encountered an issue where, after a leader election, the newly elected
> leader consistently fails to communicate with the previous (old) leader.
> Specifically, RPCs (such as heartbeats or appendEntries requests) sent from
> the new leader to the old leader always time out.
> h2. *Scene Description*
> *Initial State:* Node {{cn0}} is the established leader of the Raft group.
> *Follower Timeout & Election:* At {{{}01:09:02,862{}}}, node {{cn1}} (a
> follower) experiences an election timeout, presumably because it did not
> receive timely heartbeats from the leader {{{}cn0{}}}.
> !image-2025-05-20-12-18-05-012.png|width=1699,height=78!
> *Candidacy:* {{cn1}} transitions to the candidate state and starts a new
> leader election for a new term.
> *New Leader Elected:* Node {{cn2}} votes for {{{}cn1{}}}. Subsequently,
> {{cn1}} successfully gathers enough votes and becomes the new leader of the
> Raft group.
> !image-2025-05-20-12-18-50-729.png|width=1895,height=488!{*}Communication
> Failure with Old Leader:{*} Following this, the new leader ({{{}cn1{}}})
> begins its attempts to manage the group. However, when {{cn1}} tries to send
> RPCs (e.g., heartbeats/AppendEntries) to the _previous_ leader ({{{}cn0{}}}),
> these attempts consistently time out.
> !image-2025-05-20-12-11-08-832.png|width=1803,height=201!
> !image-2025-05-20-12-06-57-540.png|width=2015,height=689!
>
> h3. Follower‘s log
> During this period, the follower did not output any logs.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)