[ 
https://issues.apache.org/jira/browse/KAFKA-19785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18082268#comment-18082268
 ] 

Jerome Waibel commented on KAFKA-19785:
---------------------------------------

I agree that the quorum file likely did not exists during restart. But this 
does not make any sense.

I'm using the strimzi operator and I think there default setup is pretty solid.

All data from Kafka is stored on the same persistent volume, including all 
topics and metadata and of course also the quorum-file. The volume is at 1% 
usage, so no data loss due to a full hard drive. Our cluster is running in the 
Google Cloud, so I would also trust their storage and rule our accidental data 
loss on the hard drive.

When I looked into the folder on my current production Kafka right now, the 
quorum file is there on both nodes, and the content looks reasonable. I even 
performed a manual restart of one Kafka node, and everything came back as 
expected, no exceptions, leadership was decided quickly and after 2 minutes 
everything went back to normal.

I'm clueless, but I'd like to thank you for your time invested in this. Maybe, 
when the incident happens again, I can have a closer look at the quorum file 
and get some further ideas. 

> Two Kafka brokers were not active in 3 node cluster setup
> ---------------------------------------------------------
>
>                 Key: KAFKA-19785
>                 URL: https://issues.apache.org/jira/browse/KAFKA-19785
>             Project: Kafka
>          Issue Type: Bug
>          Components: core, kraft
>    Affects Versions: 4.0.0
>            Reporter: Sravani
>            Priority: Major
>              Labels: kraft
>
> Hi Team,
> We were facing kafka issue where two of the kafka brokers were fenced and 
> Kafka was not able to process messages. We are using Kafka 4.0.0 version. 
> Below are the errors.
>  
> Sep 22 09:41:42 host kafka[42245]: [2025-09-22 07:41:42,419] ERROR 
> Encountered fatal fault: Unexpected error in raft IO thread 
> (org.apache.kafka.server.fault.ProcessTerminatingFaultHandler)
> Sep 22 09:41:42 host kafka[42245]: java.lang.IllegalStateException: Received 
> request or response with leader OptionalInt[3] and epoch 55 which is 
> inconsistent with current leader OptionalInt.empty and epoch 55
> Sep 22 09:41:42 host kafka[42245]: #011at 
> org.apache.kafka.raft.KafkaRaftClient.maybeTransition(KafkaRaftClient.java:2528)
>  ~[kafka-raft-4.0.0.jar:?]
> Sep 22 09:41:42 host kafka[42245]: #011at 
> org.apache.kafka.raft.KafkaRaftClient.maybeHandleCommonResponse(KafkaRaftClient.java:2484)
>  ~[kafka-raft-4.0.0.jar:?]
> Sep 22 09:41:42 host kafka[42245]: #011at 
> org.apache.kafka.raft.KafkaRaftClient.handleFetchResponse(KafkaRaftClient.java:1707)
>  ~[kafka-raft-4.0.0.jar:?]
> Sep 22 09:41:42 host kafka[42245]: #011at 
> org.apache.kafka.raft.KafkaRaftClient.handleResponse(KafkaRaftClient.java:2568)
>  ~[kafka-raft-4.0.0.jar:?]
> Sep 22 09:41:42 host kafka[42245]: #011at 
> org.apache.kafka.raft.KafkaRaftClient.handleInboundMessage(KafkaRaftClient.java:2724)
>  ~[kafka-raft-4.0.0.jar:?]
> Sep 22 09:41:42 host kafka[42245]: #011at 
> org.apache.kafka.raft.KafkaRaftClient.poll(KafkaRaftClient.java:3460) 
> ~[kafka-raft-4.0.0.jar:?]
> Sep 22 09:41:42 host kafka[42245]: #011at 
> org.apache.kafka.raft.KafkaRaftClientDriver.doWork(KafkaRaftClientDriver.java:64)
>  [kafka-raft-4.0.0.jar:?]
> Sep 22 09:41:42 host kafka[42245]: #011at 
> org.apache.kafka.server.util.ShutdownableThread.run(ShutdownableThread.java:136)
>  [kafka-server-common-4.0.0.jar:?]
> Below metrics shows Fenceborker count as 2.0
> kafka_controller_KafkaController_Value\{name="ActiveBrokerCount",} 1.0
> kafka_controller_KafkaController_Value\{name="GlobalTopicCount",} 23.0
> kafka_controller_KafkaController_Value\{name="FencedBrokerCount",} 2.0
> Please help us to resolve this issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to