[
https://issues.apache.org/jira/browse/KAFKA-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16124325#comment-16124325
]
Pierre Mage commented on KAFKA-5074:
------------------------------------
Running 0.11.0 and observing similar behaviour.
Sequence of events recorded in logs:
1. ZooKeeper session expires
2. Kafka controller stops broker 0
3. Kafka re-register broker 0 in ZooKeeper
4. Leader cache \[mytopic,29\] ->
(Leader:2,ISR:2,0,LeaderEpoch:0,ControllerEpoch:1)
5. Invoking state change to OfflineReplica for replicas
\[Topic=mytopic,Partition=29,Replica=0\]
6. Retaining last ISR 0 of partition \[mytopic,29\] since unclean leader
election is disabled
7. New leader and ISR for partition \[mytopic,29\] is
{"leader":-1,"leader_epoch":4,"isr":[0]}
8. Not sending request (type=StopReplicaRequest...) to broker 0, since it is
offline
9. Invoking state change to OnlineReplica for replicas
\[Topic=mytopic,Partition=29,Replica=0\]
10. Cycle of failing preferred leader elections starts
OfflinePartitionLeaderSelector is not called as the partition's state is still
OnlinePartition.
{code}
ERROR Controller 2 epoch 4 encountered error while electing leader for
partition [mytopic,29] due to: Preferred replica 2 for partition [mytopci,29]
is either not alive or not in the isr. Current leader and ISR
[{"leader":-1,"leader_epoch":4,"isr":[0]}].
ERROR Controller 2 epoch 4 initiated state change for partition [mytopic,29]
from OnlinePartition to OnlinePartition failed
{code}
> Transition to OnlinePartition without preferred leader in ISR fails
> -------------------------------------------------------------------
>
> Key: KAFKA-5074
> URL: https://issues.apache.org/jira/browse/KAFKA-5074
> Project: Kafka
> Issue Type: Bug
> Components: controller
> Affects Versions: 0.9.0.0
> Reporter: Dustin Cote
>
> Running 0.9.0.0, the controller can get into a state where it no longer is
> able to elect a leader for an Offline partition. It's unclear how this state
> is first achieved but in the steady state, this happens:
> -There are partitions with a leader of -1
> -The Controller repeatedly attempts a preferred leader election for these
> partitions
> -The preferred leader election fails because the only replica in the ISR is
> not the preferred leader
> The log cycle looks like this:
> {code}
> [2017-04-12 18:00:18,891] INFO [Controller 8]: Starting preferred replica
> leader election for partitions topic,1
> [2017-04-12 18:00:18,891] INFO [Partition state machine on Controller 8]:
> Invoking state change to OnlinePartition for partitions topic,1
> [2017-04-12 18:00:18,892] INFO [PreferredReplicaPartitionLeaderSelector]:
> Current leader -1 for partition [topic,1] is not the preferred replica.
> Trigerring preferred replica leader election
> (kafka.controller.PreferredReplicaPartitionLeaderSelector)
> [2017-04-12 18:00:18,893] WARN [Controller 8]: Partition [topic,1] failed to
> complete preferred replica leader election. Leader is -1
> (kafka.controller.KafkaController)
> {code}
> It's not clear if this would happen on versions later that 0.9.0.0.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)