[
https://issues.apache.org/jira/browse/KAFKA-999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13731174#comment-13731174
]
Swapnil Ghike commented on KAFKA-999:
-------------------------------------
Actually that's not needed, will get a patch out in a couple hours.
> Controlled shutdown never succeeds until the broker is killed
> -------------------------------------------------------------
>
> Key: KAFKA-999
> URL: https://issues.apache.org/jira/browse/KAFKA-999
> Project: Kafka
> Issue Type: Bug
> Components: controller
> Affects Versions: 0.8
> Reporter: Neha Narkhede
> Assignee: Neha Narkhede
> Priority: Critical
>
> A race condition in the way leader and isr request is handled by the broker
> and controlled shutdown can lead to a situation where controlled shutdown can
> never succeed and the only way to bounce the broker is to kill it.
> The root cause is that broker uses a smart to avoid fetching from a leader
> that is not alive according to the controller. This leads to the broker
> aborting a become follower request. And in cases where replication factor is
> 2, the leader can never be transferred to a follower since it keeps rejecting
> the become follower request and stays out of the ISR. This causes controlled
> shutdown to fail forever
> One sequence of events that led to this bug is as follows -
> - Broker 2 is leader and controller
> - Broker 2 is bounced (uncontrolled shutdown)
> - Controller fails over
> - Controlled shutdown is invoked on broker 1
> - Controller starts leader election for partitions that broker 2 led
> - Controller sends become follower request with leader as broker 1 to broker
> 2. At the same time, it does not include broker 1 in alive broker list sent
> as part of leader and isr request
> - Broker 2 rejects leaderAndIsr request since leader is not in the list of
> alive brokers
> - Broker 1 fails to transfer leadership to broker 2 since broker 2 is not in
> ISR
> - Controlled shutdown can never succeed on broker 1
> Since controlled shutdown is a config option, if there are bugs in controlled
> shutdown, there is no option but to kill the broker
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira