[ 
https://issues.apache.org/jira/browse/KAFKA-19643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zzshine updated KAFKA-19643:
----------------------------
    Description: 
Inter-cluster communication is normal without packet loss, and the cluster is 
properly configured.
The Kafka server continuously prints the following logs:
{code:java}
[2025-08-25 19:08:55,581] INFO [RaftManager id=1] Become candidate due to fetch 
timeout (org.apache.kafka.raft.KafkaRaftClient)
[2025-08-25 19:08:55,686] INFO [RaftManager id=1] Disconnecting from node 2 due 
to request timeout. (org.apache.kafka.clients.NetworkClient)
[2025-08-25 19:08:55,686] INFO [RaftManager id=1] Cancelled in-flight FETCH 
request with correlation id 128927 due to node 2 being disconnected (elapsed 
time since creation: 5147ms, elapsed time since send: 5146ms, throttle time: 
0ms, request timeout: 5000ms) (org.apache.kafka.clients.NetworkClient)
[2025-08-25 19:09:33,274] INFO [NodeToControllerChannelManager id=1 
name=heartbeat] Disconnecting from node 3 due to request timeout. 
(org.apache.kafka.clients.NetworkClient)
[2025-08-25 19:09:33,274] INFO [NodeToControllerChannelManager id=1 
name=heartbeat] Cancelled in-flight BROKER_HEARTBEAT request with correlation 
id 871 due to node 3 being disconnected (elapsed time since creation: 4004ms, 
elapsed time since send: 4004ms, throttle time: 0ms, request timeout: 4000ms) 
(org.apache.kafka.clients.NetworkClient)
[2025-08-25 19:09:33,807] INFO [RaftManager id=1] Disconnecting from node 3 due 
to request timeout. (org.apache.kafka.clients.NetworkClient)
[2025-08-25 19:09:33,807] INFO [RaftManager id=1] Cancelled in-flight FETCH 
request with correlation id 128995 due to node 3 being disconnected (elapsed 
time since creation: 5720ms, elapsed time since send: 5720ms, throttle time: 
0ms, request timeout: 5000ms) (org.apache.kafka.clients.NetworkClient) {code}
Adjust Kafka parameters as follows:
{code:java}
# default 2000
broker.heartbeat.interval.ms=4000
# default 9000
broker.session.timeout.ms=10000
# default 2000
controller.quorum.request.timeout.ms=5000
# default 1000
controller.quorum.election.timeout.ms=5000
# default 1000
controller.quorum.election.backoff.max.ms=3000
# default 2000
controller.quorum.fetch.timeout.ms=6000 {code}

  was:
Inter-cluster communication is normal without packet loss, and the cluster is 
properly configured.
The Kafka server continuously prints the following logs:

 
{code:java}
[2025-08-25 19:08:55,581] INFO [RaftManager id=1] Become candidate due to fetch 
timeout (org.apache.kafka.raft.KafkaRaftClient)
[2025-08-25 19:08:55,686] INFO [RaftManager id=1] Disconnecting from node 2 due 
to request timeout. (org.apache.kafka.clients.NetworkClient)
[2025-08-25 19:08:55,686] INFO [RaftManager id=1] Cancelled in-flight FETCH 
request with correlation id 128927 due to node 2 being disconnected (elapsed 
time since creation: 5147ms, elapsed time since send: 5146ms, throttle time: 
0ms, request timeout: 5000ms) (org.apache.kafka.clients.NetworkClient)
[2025-08-25 19:09:33,274] INFO [NodeToControllerChannelManager id=1 
name=heartbeat] Disconnecting from node 3 due to request timeout. 
(org.apache.kafka.clients.NetworkClient)
[2025-08-25 19:09:33,274] INFO [NodeToControllerChannelManager id=1 
name=heartbeat] Cancelled in-flight BROKER_HEARTBEAT request with correlation 
id 871 due to node 3 being disconnected (elapsed time since creation: 4004ms, 
elapsed time since send: 4004ms, throttle time: 0ms, request timeout: 4000ms) 
(org.apache.kafka.clients.NetworkClient)
[2025-08-25 19:09:33,807] INFO [RaftManager id=1] Disconnecting from node 3 due 
to request timeout. (org.apache.kafka.clients.NetworkClient)
[2025-08-25 19:09:33,807] INFO [RaftManager id=1] Cancelled in-flight FETCH 
request with correlation id 128995 due to node 3 being disconnected (elapsed 
time since creation: 5720ms, elapsed time since send: 5720ms, throttle time: 
0ms, request timeout: 5000ms) (org.apache.kafka.clients.NetworkClient) {code}
 

Adjust Kafka parameters as follows:
{code:java}
# default 2000
broker.heartbeat.interval.ms=4000
# default 9000
broker.session.timeout.ms=10000
# default 2000
controller.quorum.request.timeout.ms=5000
# default 1000
controller.quorum.election.timeout.ms=5000
# default 1000
controller.quorum.election.backoff.max.ms=3000
# default 2000
controller.quorum.fetch.timeout.ms=6000 {code}


> Controller keeps switching and occasionally goes offline.
> ---------------------------------------------------------
>
>                 Key: KAFKA-19643
>                 URL: https://issues.apache.org/jira/browse/KAFKA-19643
>             Project: Kafka
>          Issue Type: Bug
>          Components: controller, kraft
>    Affects Versions: 3.9.1
>         Environment: CentOS Linux 7,kernel-release:4.19.325 
> Java 21
>            Reporter: zzshine
>            Priority: Major
>         Attachments: xxx.png
>
>
> Inter-cluster communication is normal without packet loss, and the cluster is 
> properly configured.
> The Kafka server continuously prints the following logs:
> {code:java}
> [2025-08-25 19:08:55,581] INFO [RaftManager id=1] Become candidate due to 
> fetch timeout (org.apache.kafka.raft.KafkaRaftClient)
> [2025-08-25 19:08:55,686] INFO [RaftManager id=1] Disconnecting from node 2 
> due to request timeout. (org.apache.kafka.clients.NetworkClient)
> [2025-08-25 19:08:55,686] INFO [RaftManager id=1] Cancelled in-flight FETCH 
> request with correlation id 128927 due to node 2 being disconnected (elapsed 
> time since creation: 5147ms, elapsed time since send: 5146ms, throttle time: 
> 0ms, request timeout: 5000ms) (org.apache.kafka.clients.NetworkClient)
> [2025-08-25 19:09:33,274] INFO [NodeToControllerChannelManager id=1 
> name=heartbeat] Disconnecting from node 3 due to request timeout. 
> (org.apache.kafka.clients.NetworkClient)
> [2025-08-25 19:09:33,274] INFO [NodeToControllerChannelManager id=1 
> name=heartbeat] Cancelled in-flight BROKER_HEARTBEAT request with correlation 
> id 871 due to node 3 being disconnected (elapsed time since creation: 4004ms, 
> elapsed time since send: 4004ms, throttle time: 0ms, request timeout: 4000ms) 
> (org.apache.kafka.clients.NetworkClient)
> [2025-08-25 19:09:33,807] INFO [RaftManager id=1] Disconnecting from node 3 
> due to request timeout. (org.apache.kafka.clients.NetworkClient)
> [2025-08-25 19:09:33,807] INFO [RaftManager id=1] Cancelled in-flight FETCH 
> request with correlation id 128995 due to node 3 being disconnected (elapsed 
> time since creation: 5720ms, elapsed time since send: 5720ms, throttle time: 
> 0ms, request timeout: 5000ms) (org.apache.kafka.clients.NetworkClient) {code}
> Adjust Kafka parameters as follows:
> {code:java}
> # default 2000
> broker.heartbeat.interval.ms=4000
> # default 9000
> broker.session.timeout.ms=10000
> # default 2000
> controller.quorum.request.timeout.ms=5000
> # default 1000
> controller.quorum.election.timeout.ms=5000
> # default 1000
> controller.quorum.election.backoff.max.ms=3000
> # default 2000
> controller.quorum.fetch.timeout.ms=6000 {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to