[ https://issues.apache.org/jira/browse/KAFKA-19643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
zzshine updated KAFKA-19643: ---------------------------- Attachment: (was: xxx.png) > Controller keeps switching and occasionally goes offline. > --------------------------------------------------------- > > Key: KAFKA-19643 > URL: https://issues.apache.org/jira/browse/KAFKA-19643 > Project: Kafka > Issue Type: Bug > Components: controller, kraft > Affects Versions: 3.9.1 > Environment: CentOS Linux 7,kernel-release:4.19.325 > Java 21 > Reporter: zzshine > Priority: Major > Attachments: part_leader_to_one_node.png > > > Inter-cluster communication is normal without packet loss, and the cluster is > properly configured. > The Kafka server continuously prints the following logs: > {code:java} > [2025-08-25 19:08:55,581] INFO [RaftManager id=1] Become candidate due to > fetch timeout (org.apache.kafka.raft.KafkaRaftClient) > [2025-08-25 19:08:55,686] INFO [RaftManager id=1] Disconnecting from node 2 > due to request timeout. (org.apache.kafka.clients.NetworkClient) > [2025-08-25 19:08:55,686] INFO [RaftManager id=1] Cancelled in-flight FETCH > request with correlation id 128927 due to node 2 being disconnected (elapsed > time since creation: 5147ms, elapsed time since send: 5146ms, throttle time: > 0ms, request timeout: 5000ms) (org.apache.kafka.clients.NetworkClient) > [2025-08-25 19:09:33,274] INFO [NodeToControllerChannelManager id=1 > name=heartbeat] Disconnecting from node 3 due to request timeout. > (org.apache.kafka.clients.NetworkClient) > [2025-08-25 19:09:33,274] INFO [NodeToControllerChannelManager id=1 > name=heartbeat] Cancelled in-flight BROKER_HEARTBEAT request with correlation > id 871 due to node 3 being disconnected (elapsed time since creation: 4004ms, > elapsed time since send: 4004ms, throttle time: 0ms, request timeout: 4000ms) > (org.apache.kafka.clients.NetworkClient) > [2025-08-25 19:09:33,807] INFO [RaftManager id=1] Disconnecting from node 3 > due to request timeout. (org.apache.kafka.clients.NetworkClient) > [2025-08-25 19:09:33,807] INFO [RaftManager id=1] Cancelled in-flight FETCH > request with correlation id 128995 due to node 3 being disconnected (elapsed > time since creation: 5720ms, elapsed time since send: 5720ms, throttle time: > 0ms, request timeout: 5000ms) (org.apache.kafka.clients.NetworkClient) {code} > Adjust Kafka parameters as follows: > {code:java} > # default 2000 > broker.heartbeat.interval.ms=4000 > # default 9000 > broker.session.timeout.ms=10000 > # default 2000 > controller.quorum.request.timeout.ms=5000 > # default 1000 > controller.quorum.election.timeout.ms=5000 > # default 1000 > controller.quorum.election.backoff.max.ms=3000 > # default 2000 > controller.quorum.fetch.timeout.ms=6000 {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)