[ https://issues.apache.org/jira/browse/KAFKA-19643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
zzshine updated KAFKA-19643: ---------------------------- Description: Inter-cluster communication is normal without packet loss, and the cluster is properly configured. The Kafka server continuously prints the following logs: [2025-08-25 19:08:55,581] INFO [RaftManager id=1] Become candidate due to fetch timeout (org.apache.kafka.raft.KafkaRaftClient) [2025-08-25 19:08:55,686] INFO [RaftManager id=1] Disconnecting from node 2 due to request timeout. (org.apache.kafka.clients.NetworkClient) [2025-08-25 19:08:55,686] INFO [RaftManager id=1] Cancelled in-flight FETCH request with correlation id 128927 due to node 2 being disconnected (elapsed time since creation: 5147ms, elapsed time since send: 5146ms, throttle time: 0ms, request timeout: 5000ms) (org.apache.kafka.clients.NetworkClient) [2025-08-25 19:09:33,274] INFO [NodeToControllerChannelManager id=1 name=heartbeat] Disconnecting from node 3 due to request timeout. (org.apache.kafka.clients.NetworkClient) [2025-08-25 19:09:33,274] INFO [NodeToControllerChannelManager id=1 name=heartbeat] Cancelled in-flight BROKER_HEARTBEAT request with correlation id 871 due to node 3 being disconnected (elapsed time since creation: 4004ms, elapsed time since send: 4004ms, throttle time: 0ms, request timeout: 4000ms) (org.apache.kafka.clients.NetworkClient) [2025-08-25 19:09:33,807] INFO [RaftManager id=1] Disconnecting from node 3 due to request timeout. (org.apache.kafka.clients.NetworkClient) [2025-08-25 19:09:33,807] INFO [RaftManager id=1] Cancelled in-flight FETCH request with correlation id 128995 due to node 3 being disconnected (elapsed time since creation: 5720ms, elapsed time since send: 5720ms, throttle time: 0ms, request timeout: 5000ms) (org.apache.kafka.clients.NetworkClient) Adjust Kafka parameters as follows: {code:java} # default 2000 broker.heartbeat.interval.ms=4000 # default 9000 broker.session.timeout.ms=10000 # default 2000 controller.quorum.request.timeout.ms=5000 # default 1000 controller.quorum.election.timeout.ms=5000 # default 1000 controller.quorum.election.backoff.max.ms=3000 # default 2000 controller.quorum.fetch.timeout.ms=6000 {code} was: Inter-cluster communication is normal without packet loss, and the cluster is properly configured. The Kafka server continuously prints the following logs: [2025-08-25 19:08:55,581] INFO [RaftManager id=1] Become candidate due to fetch timeout (org.apache.kafka.raft.KafkaRaftClient) [2025-08-25 19:08:55,686] INFO [RaftManager id=1] Disconnecting from node 2 due to request timeout. (org.apache.kafka.clients.NetworkClient) [2025-08-25 19:08:55,686] INFO [RaftManager id=1] Cancelled in-flight FETCH request with correlation id 128927 due to node 2 being disconnected (elapsed time since creation: 5147ms, elapsed time since send: 5146ms, throttle time: 0ms, request timeout: 5000ms) (org.apache.kafka.clients.NetworkClient) [2025-08-25 19:09:33,274] INFO [NodeToControllerChannelManager id=1 name=heartbeat] Disconnecting from node 3 due to request timeout. (org.apache.kafka.clients.NetworkClient) [2025-08-25 19:09:33,274] INFO [NodeToControllerChannelManager id=1 name=heartbeat] Cancelled in-flight BROKER_HEARTBEAT request with correlation id 871 due to node 3 being disconnected (elapsed time since creation: 4004ms, elapsed time since send: 4004ms, throttle time: 0ms, request timeout: 4000ms) (org.apache.kafka.clients.NetworkClient) [2025-08-25 19:09:33,807] INFO [RaftManager id=1] Disconnecting from node 3 due to request timeout. (org.apache.kafka.clients.NetworkClient) [2025-08-25 19:09:33,807] INFO [RaftManager id=1] Cancelled in-flight FETCH request with correlation id 128995 due to node 3 being disconnected (elapsed time since creation: 5720ms, elapsed time since send: 5720ms, throttle time: 0ms, request timeout: 5000ms) (org.apache.kafka.clients.NetworkClient) Kafka karft config is: # default 2000 broker.heartbeat.interval.ms=4000 # default 9000 broker.session.timeout.ms=10000 # default 2000 controller.quorum.request.timeout.ms=5000 # default 1000 controller.quorum.election.timeout.ms=5000 # default 1000 controller.quorum.election.backoff.max.ms=3000 # default 2000 controller.quorum.fetch.timeout.ms=6000 > Controller keeps switching and occasionally goes offline. > --------------------------------------------------------- > > Key: KAFKA-19643 > URL: https://issues.apache.org/jira/browse/KAFKA-19643 > Project: Kafka > Issue Type: Bug > Components: controller, kraft > Affects Versions: 3.9.1 > Environment: CentOS Linux 7,kernel-release:4.19.325 > Java 21 > Reporter: zzshine > Priority: Major > Attachments: xxx.png > > > Inter-cluster communication is normal without packet loss, and the cluster is > properly configured. > The Kafka server continuously prints the following logs: > [2025-08-25 19:08:55,581] INFO [RaftManager id=1] Become candidate due to > fetch timeout (org.apache.kafka.raft.KafkaRaftClient) > [2025-08-25 19:08:55,686] INFO [RaftManager id=1] Disconnecting from node 2 > due to request timeout. (org.apache.kafka.clients.NetworkClient) > [2025-08-25 19:08:55,686] INFO [RaftManager id=1] Cancelled in-flight FETCH > request with correlation id 128927 due to node 2 being disconnected (elapsed > time since creation: 5147ms, elapsed time since send: 5146ms, throttle time: > 0ms, request timeout: 5000ms) (org.apache.kafka.clients.NetworkClient) > [2025-08-25 19:09:33,274] INFO [NodeToControllerChannelManager id=1 > name=heartbeat] Disconnecting from node 3 due to request timeout. > (org.apache.kafka.clients.NetworkClient) > [2025-08-25 19:09:33,274] INFO [NodeToControllerChannelManager id=1 > name=heartbeat] Cancelled in-flight BROKER_HEARTBEAT request with correlation > id 871 due to node 3 being disconnected (elapsed time since creation: 4004ms, > elapsed time since send: 4004ms, throttle time: 0ms, request timeout: 4000ms) > (org.apache.kafka.clients.NetworkClient) > [2025-08-25 19:09:33,807] INFO [RaftManager id=1] Disconnecting from node 3 > due to request timeout. (org.apache.kafka.clients.NetworkClient) > [2025-08-25 19:09:33,807] INFO [RaftManager id=1] Cancelled in-flight FETCH > request with correlation id 128995 due to node 3 being disconnected (elapsed > time since creation: 5720ms, elapsed time since send: 5720ms, throttle time: > 0ms, request timeout: 5000ms) (org.apache.kafka.clients.NetworkClient) > Adjust Kafka parameters as follows: > {code:java} > # default 2000 > broker.heartbeat.interval.ms=4000 > # default 9000 > broker.session.timeout.ms=10000 > # default 2000 > controller.quorum.request.timeout.ms=5000 > # default 1000 > controller.quorum.election.timeout.ms=5000 > # default 1000 > controller.quorum.election.backoff.max.ms=3000 > # default 2000 > controller.quorum.fetch.timeout.ms=6000 {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)