[ https://issues.apache.org/jira/browse/KAFKA-3215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ismael Juma updated KAFKA-3215: ------------------------------- Labels: controller (was: ) > controller may not be started when there are multiple ZK session expirations > ---------------------------------------------------------------------------- > > Key: KAFKA-3215 > URL: https://issues.apache.org/jira/browse/KAFKA-3215 > Project: Kafka > Issue Type: Bug > Components: core > Reporter: Jun Rao > Labels: controller > > Suppose that broker 1 is the controller and it has 2 consecutive ZK session > expirations. In this case, two ZK session expiration events will be fired. > 1. When handling the first ZK session expiration event, > SessionExpirationListener.handleNewSession() can elect broker 1 itself as the > new controller and initialize the states properly. > 2. When handling the second ZK session expiration event, > SessionExpirationListener.handleNewSession() first calls > onControllerResignation(), which will set ReplicaStateMachine.hasStarted to > false. It then continues to do controller election in > ZookeeperLeaderElector.elect() and try to create the controller node in ZK. > This will fail since broker 1 has already registered itself as the controller > node in ZK. In this case, we simply ignore the failure to create the > controller node since we assume the controller must be in another broker. > However, in this case, the controller is broker 1 itself, but the > ReplicaStateMachine.hasStarted is still false. > 3. Now, if a new broker event is fired, we will be ignoring the event in > BrokerChangeListener.handleChildChange since ReplicaStateMachine.hasStarted > is false. Now, we are in a situation that a controller is alive, but won't > react to any broker change event. -- This message was sent by Atlassian JIRA (v6.3.4#6332)