Dong Lin created KAFKA-4442:
-------------------------------
Summary: Controller should grab lock when it is being initialized
to avoid race condition
Key: KAFKA-4442
URL: https://issues.apache.org/jira/browse/KAFKA-4442
Project: Kafka
Issue Type: Bug
Reporter: Dong Lin
Assignee: Dong Lin
Currently controller will register broker change listener before sending send
LeaderAndIsrRequests to live replicas. The call path looks like this:
- onControllerFailover()
- partitionStateMachine.startup()
- triggerOnlinePartitionStateChange()
- handleStateChange(partition, OnlinePartition)
- electLeaderForPartition(partition)
- determines live replicas for this partition (step a)
- add partition to controllerContext.partitionLeadershipInfo. (step b)
- send LeaderAndIsrRequest to those live replics for this partition
However, if a broker registers itself in zookeeper in between step (a) and step
(b), the onBrokerStartup() will not send LeaderAndIsrRequest to this broker for
this partition because the partition is not found in
controllerContext.partitionLeadershipInfo. Yet onControllerFailover() will not
send LeaderAndIsrRequest to this broker for this partition either before the
broker is not considered live in step (a).
The root cause is that onBrokerStartup() should only be executed after
controller has finished onControllerFailover() and initialized its state.
Therefore controller should grab the lock controllerContext.controllerLock
during onControllerFailover().
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)