[ https://issues.apache.org/jira/browse/KAFKA-17752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Josep Prat updated KAFKA-17752: ------------------------------- Labels: kraft (was: ) > Contoller crashes when removed if it is an initial controller > ------------------------------------------------------------- > > Key: KAFKA-17752 > URL: https://issues.apache.org/jira/browse/KAFKA-17752 > Project: Kafka > Issue Type: Bug > Components: kraft > Affects Versions: 3.9.0 > Reporter: Juha Mynttinen > Priority: Major > Labels: kraft > > Hey, > Tested using 3.9.0 RC0. The issue only affects kraft. > It seems that "kafka-metadata-quorum.sh remove-controller" causes the removed > controller to crash if it is one of the controllers specified using > "--initial-controllers " > Steps to reproduce: > Clean up and setup the environment > rm -rf /tmp/controllers && \ > mkdir -p /tmp/controllers/c1 && \ > mkdir -p /tmp/controllers/c2 && \ > mkdir -p /tmp/controllers/c3 > export KAFKA_HOME=<your_kafka_3_9_home> > Format the controllers > $KAFKA_HOME/bin/kafka-storage.sh format --cluster-id > 00000000-0000-0000-0000-000000000001 --initial-controllers > 1001@localhost:10001:AAAAAAAAAAEAAAAAAAAAAA,1002@localhost:10002:AAAAAAAAAAEAAAAAAAAAAA,1003@localhost:10003:AAAAAAAAAAEAAAAAAAAAAA > --config c1.properties > $KAFKA_HOME/bin/kafka-storage.sh format --cluster-id > 00000000-0000-0000-0000-000000000001 --initial-controllers > 1001@localhost:10001:AAAAAAAAAAEAAAAAAAAAAA,1002@localhost:10002:AAAAAAAAAAEAAAAAAAAAAA,1003@localhost:10003:AAAAAAAAAAEAAAAAAAAAAA > --config c2.properties > $KAFKA_HOME/bin/kafka-storage.sh format --cluster-id > 00000000-0000-0000-0000-000000000001 --initial-controllers > 1001@localhost:10001:AAAAAAAAAAEAAAAAAAAAAA,1002@localhost:10002:AAAAAAAAAAEAAAAAAAAAAA,1003@localhost:10003:AAAAAAAAAAEAAAAAAAAAAA > --config c3.properties > Start the controllers, in separate terminals > $KAFKA_HOME/bin/kafka-run-class.sh -name kafkaService kafka.Kafka > c1.properties > $KAFKA_HOME/bin/kafka-run-class.sh -name kafkaService kafka.Kafka > c2.properties > $KAFKA_HOME/bin/kafka-run-class.sh -name kafkaService kafka.Kafka > c3.properties > Remove a controller: > $KAFKA_HOME/bin/kafka-metadata-quorum.sh --bootstrap-controller > localhost:10001,localhost:10002,localhost:10003,localhost:10004 > remove-controller --controller-id 1001 --controller-directory-id > AAAAAAAAAAEAAAAAAAAAAA > The process crashes with the following error: > [2024-10-09 15:19:15,574] ERROR Encountered fatal fault: exception while > renouncing leadership > (org.apache.kafka.server.fault.ProcessTerminatingFaultHandler) > java.lang.RuntimeException: Unable to reset to last stable offset 55. No > in-memory snapshot found for this offset. > at > org.apache.kafka.controller.OffsetControlManager.deactivate(OffsetControlManager.java:268) > at > org.apache.kafka.controller.QuorumController.renounce(QuorumController.java:1281) > at > org.apache.kafka.controller.QuorumController.handleEventException(QuorumController.java:552) > at > org.apache.kafka.controller.QuorumController.access$800(QuorumController.java:180) > at > org.apache.kafka.controller.QuorumController$ControllerWriteEvent.complete(QuorumController.java:885) > at > org.apache.kafka.controller.QuorumController$ControllerWriteEvent.handleException(QuorumController.java:875) > at > org.apache.kafka.queue.KafkaEventQueue$EventContext.completeWithException(KafkaEventQueue.java:153) > at > org.apache.kafka.queue.KafkaEventQueue$EventContext.run(KafkaEventQueue.java:142) > at > org.apache.kafka.queue.KafkaEventQueue$EventHandler.handleEvents(KafkaEventQueue.java:215) > at > org.apache.kafka.queue.KafkaEventQueue$EventHandler.run(KafkaEventQueue.java:186) > at java.base/java.lang.Thread.run(Thread.java:840) > If the process that died is restarted it joins the cluster and becomes on > observer, as expected. > The crash doesn't happen in a slightly different case, exact steps missing. > But the idea is this: > 1. Create a 3-controller cluster as above > 2. Format and start a 4rd controller. > 3. Add the 4th controller as a voter. > 4. Remove the 4th controller to make it an observer. It becomes observer as > expected. > Because this case works, I'm guessing the crash is somehow related to the > controller being one of the initial controllers. > I didn't dig deeper on why the crash occurs. -- This message was sent by Atlassian Jira (v8.20.10#820010)