[ 
https://issues.apache.org/jira/browse/KAFKA-19785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18081786#comment-18081786
 ] 

Kevin Wu commented on KAFKA-19785:
----------------------------------

Hi [~jwaibel2],

 
{noformat}
2026-05-02 00:56:00 WARN [main] QuorumState:158 - [RaftManager id=1] Epoch from 
quorum store file 
(/var/lib/kafka/data-0/kafka-log1/__cluster_metadata-0/quorum-state) is 0, 
which is smaller than last written epoch 66 in the log{noformat}
>From my reading of the code, the epoch being 0 means that this file does not 
>exist when the pod is restarted. Take a look at `QuorumState#L135`. Are you 
>able to confirm that this file does not exist at the time of restart? If so, 
>that could be an operator misconfiguration. We should add a log message to 
>Kafka for this case. It's interesting to me that it seems the `quorum-state` 
>file is lost, but not the log itself.

Raft assumes persistent data like the information in `quorum-state` cannot be 
permanently lost for a voter. For example, losing this data can violate the 
1-vote per epoch invariant, resulting in two leaders for the same epoch. If 
this persistent data is lost, your second incarnation of the pod is a new 
logical voter from the perspective of KRaft. 

 

Have you considered using 
[https://cwiki.apache.org/confluence/display/KAFKA/KIP-853%3A+KRaft+Controller+Membership+Changes]
 or configuring your orchestration layer to use the same persistent volume when 
the kafka pod is restarted?

> Two Kafka brokers were not active in 3 node cluster setup
> ---------------------------------------------------------
>
>                 Key: KAFKA-19785
>                 URL: https://issues.apache.org/jira/browse/KAFKA-19785
>             Project: Kafka
>          Issue Type: Bug
>          Components: core, kraft
>    Affects Versions: 4.0.0
>            Reporter: Sravani
>            Priority: Major
>              Labels: kraft
>
> Hi Team,
> We were facing kafka issue where two of the kafka brokers were fenced and 
> Kafka was not able to process messages. We are using Kafka 4.0.0 version. 
> Below are the errors.
>  
> Sep 22 09:41:42 host kafka[42245]: [2025-09-22 07:41:42,419] ERROR 
> Encountered fatal fault: Unexpected error in raft IO thread 
> (org.apache.kafka.server.fault.ProcessTerminatingFaultHandler)
> Sep 22 09:41:42 host kafka[42245]: java.lang.IllegalStateException: Received 
> request or response with leader OptionalInt[3] and epoch 55 which is 
> inconsistent with current leader OptionalInt.empty and epoch 55
> Sep 22 09:41:42 host kafka[42245]: #011at 
> org.apache.kafka.raft.KafkaRaftClient.maybeTransition(KafkaRaftClient.java:2528)
>  ~[kafka-raft-4.0.0.jar:?]
> Sep 22 09:41:42 host kafka[42245]: #011at 
> org.apache.kafka.raft.KafkaRaftClient.maybeHandleCommonResponse(KafkaRaftClient.java:2484)
>  ~[kafka-raft-4.0.0.jar:?]
> Sep 22 09:41:42 host kafka[42245]: #011at 
> org.apache.kafka.raft.KafkaRaftClient.handleFetchResponse(KafkaRaftClient.java:1707)
>  ~[kafka-raft-4.0.0.jar:?]
> Sep 22 09:41:42 host kafka[42245]: #011at 
> org.apache.kafka.raft.KafkaRaftClient.handleResponse(KafkaRaftClient.java:2568)
>  ~[kafka-raft-4.0.0.jar:?]
> Sep 22 09:41:42 host kafka[42245]: #011at 
> org.apache.kafka.raft.KafkaRaftClient.handleInboundMessage(KafkaRaftClient.java:2724)
>  ~[kafka-raft-4.0.0.jar:?]
> Sep 22 09:41:42 host kafka[42245]: #011at 
> org.apache.kafka.raft.KafkaRaftClient.poll(KafkaRaftClient.java:3460) 
> ~[kafka-raft-4.0.0.jar:?]
> Sep 22 09:41:42 host kafka[42245]: #011at 
> org.apache.kafka.raft.KafkaRaftClientDriver.doWork(KafkaRaftClientDriver.java:64)
>  [kafka-raft-4.0.0.jar:?]
> Sep 22 09:41:42 host kafka[42245]: #011at 
> org.apache.kafka.server.util.ShutdownableThread.run(ShutdownableThread.java:136)
>  [kafka-server-common-4.0.0.jar:?]
> Below metrics shows Fenceborker count as 2.0
> kafka_controller_KafkaController_Value\{name="ActiveBrokerCount",} 1.0
> kafka_controller_KafkaController_Value\{name="GlobalTopicCount",} 23.0
> kafka_controller_KafkaController_Value\{name="FencedBrokerCount",} 2.0
> Please help us to resolve this issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to