[
https://issues.apache.org/jira/browse/KAFKA-19785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18082268#comment-18082268
]
Jerome Waibel commented on KAFKA-19785:
---------------------------------------
I agree that the quorum file likely did not exists during restart. But this
does not make any sense.
I'm using the strimzi operator and I think there default setup is pretty solid.
All data from Kafka is stored on the same persistent volume, including all
topics and metadata and of course also the quorum-file. The volume is at 1%
usage, so no data loss due to a full hard drive. Our cluster is running in the
Google Cloud, so I would also trust their storage and rule our accidental data
loss on the hard drive.
When I looked into the folder on my current production Kafka right now, the
quorum file is there on both nodes, and the content looks reasonable. I even
performed a manual restart of one Kafka node, and everything came back as
expected, no exceptions, leadership was decided quickly and after 2 minutes
everything went back to normal.
I'm clueless, but I'd like to thank you for your time invested in this. Maybe,
when the incident happens again, I can have a closer look at the quorum file
and get some further ideas.
> Two Kafka brokers were not active in 3 node cluster setup
> ---------------------------------------------------------
>
> Key: KAFKA-19785
> URL: https://issues.apache.org/jira/browse/KAFKA-19785
> Project: Kafka
> Issue Type: Bug
> Components: core, kraft
> Affects Versions: 4.0.0
> Reporter: Sravani
> Priority: Major
> Labels: kraft
>
> Hi Team,
> We were facing kafka issue where two of the kafka brokers were fenced and
> Kafka was not able to process messages. We are using Kafka 4.0.0 version.
> Below are the errors.
>
> Sep 22 09:41:42 host kafka[42245]: [2025-09-22 07:41:42,419] ERROR
> Encountered fatal fault: Unexpected error in raft IO thread
> (org.apache.kafka.server.fault.ProcessTerminatingFaultHandler)
> Sep 22 09:41:42 host kafka[42245]: java.lang.IllegalStateException: Received
> request or response with leader OptionalInt[3] and epoch 55 which is
> inconsistent with current leader OptionalInt.empty and epoch 55
> Sep 22 09:41:42 host kafka[42245]: #011at
> org.apache.kafka.raft.KafkaRaftClient.maybeTransition(KafkaRaftClient.java:2528)
> ~[kafka-raft-4.0.0.jar:?]
> Sep 22 09:41:42 host kafka[42245]: #011at
> org.apache.kafka.raft.KafkaRaftClient.maybeHandleCommonResponse(KafkaRaftClient.java:2484)
> ~[kafka-raft-4.0.0.jar:?]
> Sep 22 09:41:42 host kafka[42245]: #011at
> org.apache.kafka.raft.KafkaRaftClient.handleFetchResponse(KafkaRaftClient.java:1707)
> ~[kafka-raft-4.0.0.jar:?]
> Sep 22 09:41:42 host kafka[42245]: #011at
> org.apache.kafka.raft.KafkaRaftClient.handleResponse(KafkaRaftClient.java:2568)
> ~[kafka-raft-4.0.0.jar:?]
> Sep 22 09:41:42 host kafka[42245]: #011at
> org.apache.kafka.raft.KafkaRaftClient.handleInboundMessage(KafkaRaftClient.java:2724)
> ~[kafka-raft-4.0.0.jar:?]
> Sep 22 09:41:42 host kafka[42245]: #011at
> org.apache.kafka.raft.KafkaRaftClient.poll(KafkaRaftClient.java:3460)
> ~[kafka-raft-4.0.0.jar:?]
> Sep 22 09:41:42 host kafka[42245]: #011at
> org.apache.kafka.raft.KafkaRaftClientDriver.doWork(KafkaRaftClientDriver.java:64)
> [kafka-raft-4.0.0.jar:?]
> Sep 22 09:41:42 host kafka[42245]: #011at
> org.apache.kafka.server.util.ShutdownableThread.run(ShutdownableThread.java:136)
> [kafka-server-common-4.0.0.jar:?]
> Below metrics shows Fenceborker count as 2.0
> kafka_controller_KafkaController_Value\{name="ActiveBrokerCount",} 1.0
> kafka_controller_KafkaController_Value\{name="GlobalTopicCount",} 23.0
> kafka_controller_KafkaController_Value\{name="FencedBrokerCount",} 2.0
> Please help us to resolve this issue.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)