[
https://issues.apache.org/jira/browse/KAFKA-4442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15694773#comment-15694773
]
ASF GitHub Bot commented on KAFKA-4442:
---------------------------------------
GitHub user lindong28 opened a pull request:
https://github.com/apache/kafka/pull/2167
KAFKA-4442; Controller should grab lock when it is being initialized to
avoid race condition
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/lindong28/kafka KAFKA-4442
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/kafka/pull/2167.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #2167
----
commit 16825e60963844ab0729bf290cfc9e6cee79932f
Author: Dong Lin <[email protected]>
Date: 2016-11-25T04:07:09Z
KAFKA-4442; Controller should grab lock when it is being initialized to
avoid race condition
----
> Controller should grab lock when it is being initialized to avoid race
> condition
> --------------------------------------------------------------------------------
>
> Key: KAFKA-4442
> URL: https://issues.apache.org/jira/browse/KAFKA-4442
> Project: Kafka
> Issue Type: Bug
> Reporter: Dong Lin
> Assignee: Dong Lin
>
> Currently controller will register broker change listener before sending send
> LeaderAndIsrRequests to live replicas. The call path looks like this:
> - onControllerFailover()
> - partitionStateMachine.startup()
> - triggerOnlinePartitionStateChange()
> - handleStateChange(partition, OnlinePartition)
> - electLeaderForPartition(partition)
> - determines live replicas for this partition (step a)
> - add partition to controllerContext.partitionLeadershipInfo. (step
> b)
> - send LeaderAndIsrRequest to those live replics for this partition
> However, if a broker registers itself in zookeeper in between step (a) and
> step (b), the onBrokerStartup() will not send LeaderAndIsrRequest to this
> broker for this partition because the partition is not found in
> controllerContext.partitionLeadershipInfo. Yet onControllerFailover() will
> not send LeaderAndIsrRequest to this broker for this partition either before
> the broker is not considered live in step (a).
> The root cause is that onBrokerStartup() should only be executed after
> controller has finished onControllerFailover() and initialized its state.
> Therefore controller should grab the lock controllerContext.controllerLock
> during onControllerFailover().
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)