Uwe Eisele created KAFKA-6714:
---------------------------------

             Summary: KafkaController marks all Brokers as "Shutting down", 
though only one broker has been shut down
                 Key: KAFKA-6714
                 URL: https://issues.apache.org/jira/browse/KAFKA-6714
             Project: Kafka
          Issue Type: Bug
          Components: controller, core
    Affects Versions: 0.11.0.2
         Environment: Kafka Cluster on Amazon AWS EC2 r4.2xlarge instances with 
5 nodes and a Zookeeper Cluster on r4.2xlarge instances with 3 nodes. The 
Cluster is distributed across 2 availability zones.
            Reporter: Uwe Eisele


In our Kafka Cluster we experienced a situation in wich the Kafka controller 
has all Brokers marked as "Shutting down", though indeed only one Broker has 
been shut down.

The last log entry about the broker state before the entry that states that all 
brokers are shutting down states that no brokers are shutting down.

The consequence of this weird state is, that the Kafka controller is not able 
to elect any partition leader.
{code:java}
[2018-03-15 16:28:24,288] INFO [Controller 5]: Shutting down broker 5 
(kafka.controller.KafkaController)
[2018-03-15 16:28:24,288] DEBUG [Controller 5]: All shutting down brokers: 5 
(kafka.controller.KafkaController)
[2018-03-15 16:28:24,288] DEBUG [Controller 5]: Live brokers: 1,2,3,4 
(kafka.controller.KafkaController)
...
[2018-03-15 16:28:36,846] INFO [Controller 3]: Currently active brokers in the 
cluster: Set(1, 2, 3, 4) (kafka.controller.KafkaController)
[2018-03-15 16:28:36,846] INFO [Controller 3]: Currently shutting brokers in 
the cluster: Set() (kafka.controller.KafkaController)
...
[2018-03-19 17:57:22,273] INFO [Controller 3]: Shutting down broker 1 
(kafka.controller.KafkaController)
[2018-03-19 17:57:22,273] DEBUG [Controller 3]: All shutting down brokers: 
1,5,2,3,4 (kafka.controller.KafkaController)
[2018-03-19 17:57:22,273] DEBUG [Controller 3]: Live brokers:  
(kafka.controller.KafkaController)
...
[2018-03-19 17:57:22,275] ERROR Controller 3 epoch 83 encountered error while 
electing leader for partition 
[zughaltphase_v3_intern_intern_partitioned_by_evanummer,6] due to: No other 
replicas in ISR 1,3,5 for 
[zughaltphase_v3_intern_intern_partitioned_by_evanummer,6] besides shutting 
down brokers 1,5,2,3,4. (state.change.logger) {code}
The question is why the Kafka controller assumes that all brokers are shutting 
down?

The only place in the Kafka code (0.11.0.2) we found in which the shutting down 
broker set is changed is in the class _kafka.controller.KafkaControler_ in line 
1407 in the method _doControlledShutdown_.

 
{code:java}
info("Shutting down broker " + id)

if (!controllerContext.liveOrShuttingDownBrokerIds.contains(id))
  throw new BrokerNotAvailableException("Broker id %d does not 
exist.".format(id))

controllerContext.shuttingDownBrokerIds.add(id)
{code}
However, we should see the log entry "Shutting down broker n" for all Brokers 
in the log file, but it is not there.

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to