[ https://issues.apache.org/jira/browse/KAFKA-1503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14039417#comment-14039417 ]
Jianwen Wang commented on KAFKA-1503: ------------------------------------- Hi Guozhang, That is not what I experienced in my real cluster environment: Given a cluster contains three brokers, when all brokers are live and running, and there is a topic with 10 partitions: I just made a query on my cluster and got the topic info as below right now: root@kafka-1:~/kafka-0.8.1.1-src# bin/kafka-topics.sh --zookeeper localhost:2181 --topic ruby-p10 --describe Topic:ruby-p10 PartitionCount:10 ReplicationFactor:3 Configs: Topic: ruby-p10 Partition: 0 Leader: 2 Replicas: 1,2,3 Isr: 2,1,3 Topic: ruby-p10 Partition: 1 Leader: 2 Replicas: 2,3,1 Isr: 2,1,3 Topic: ruby-p10 Partition: 2 Leader: 3 Replicas: 3,1,2 Isr: 2,1,3 Topic: ruby-p10 Partition: 3 Leader: 1 Replicas: 1,3,2 Isr: 2,1,3 Topic: ruby-p10 Partition: 4 Leader: 2 Replicas: 2,1,3 Isr: 2,1,3 Topic: ruby-p10 Partition: 5 Leader: 3 Replicas: 3,2,1 Isr: 2,1,3 Topic: ruby-p10 Partition: 6 Leader: 2 Replicas: 1,2,3 Isr: 2,1,3 Topic: ruby-p10 Partition: 7 Leader: 2 Replicas: 2,3,1 Isr: 2,1,3 Topic: ruby-p10 Partition: 8 Leader: 3 Replicas: 3,1,2 Isr: 2,1,3 Topic: ruby-p10 Partition: 9 Leader: 1 Replicas: 1,3,2 Isr: 2,1,3 As you can see the ISR is not evenly distributed, and since current codes always pick the first one. So if broker 1 is down, partition hosted by broker 1 will be changed to be broker 2 instead of evenly distributed to broker 2 and broker 3. As times goes on, all the partition will be only hosted on one broker. > all partitions are using same broker as their leader after broker is down > ------------------------------------------------------------------------- > > Key: KAFKA-1503 > URL: https://issues.apache.org/jira/browse/KAFKA-1503 > Project: Kafka > Issue Type: Bug > Components: controller > Affects Versions: 0.8.0, 0.8.1.1 > Environment: 0.8.1.1 > Reporter: Jianwen Wang > Assignee: Neha Narkhede > > The current leader selection always pick the first live broker in ISR when > the current leader broker is down. Since the list of liveBrokerInIsr is not > evenly distributed. As time goes on, all the partitions will use only one > broker as its leader. > I figured out a fix which is to use the first live broker in replica list > which is also in ISR list. Since the liveAssignedReplicas is evenly > distributed across brokers, all the partitions will be evenly distributed in > the live brokers in ISR. > The fix is: > kafka-0.8.1.1-src/core/src/main/scala/kafka/controller/PartitionLeaderSelector.scala > 71 71 > case false => > 72 > - val newLeader = liveBrokersInIsr.head > 72 > + val liveReplicasInIsr = liveAssignedReplicas.filter(r => > liveBrokersInIsr.contains(r)) > 73 > + val newLeader = liveReplicasInIsr.head -- This message was sent by Atlassian JIRA (v6.2#6252)