[
https://issues.apache.org/jira/browse/KAFKA-4442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15726975#comment-15726975
]
ASF GitHub Bot commented on KAFKA-4442:
---------------------------------------
Github user lindong28 closed the pull request at:
https://github.com/apache/kafka/pull/2167
> Controller should grab lock when it is being initialized to avoid race
> condition
> --------------------------------------------------------------------------------
>
> Key: KAFKA-4442
> URL: https://issues.apache.org/jira/browse/KAFKA-4442
> Project: Kafka
> Issue Type: Bug
> Reporter: Dong Lin
> Assignee: Dong Lin
>
> Currently controller will register broker change listener before sending send
> LeaderAndIsrRequests to live replicas. The call path looks like this:
> - onControllerFailover()
> - partitionStateMachine.startup()
> - triggerOnlinePartitionStateChange()
> - handleStateChange(partition, OnlinePartition)
> - electLeaderForPartition(partition)
> - determines live replicas for this partition (step a)
> - add partition to controllerContext.partitionLeadershipInfo. (step
> b)
> - send LeaderAndIsrRequest to those live replics for this partition
> However, if a broker registers itself in zookeeper in between step (a) and
> step (b), the onBrokerStartup() will not send LeaderAndIsrRequest to this
> broker for this partition because the partition is not found in
> controllerContext.partitionLeadershipInfo. Yet onControllerFailover() will
> not send LeaderAndIsrRequest to this broker for this partition either because
> the broker is not considered live in step (a).
> The root cause is that onBrokerStartup() should only be executed after
> controller has finished onControllerFailover() and initialized its state.
> Therefore controller should grab the lock controllerContext.controllerLock
> during onControllerFailover().
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)