[ https://issues.apache.org/jira/browse/KAFKA-15354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17755805#comment-17755805 ]
Sagar Rao commented on KAFKA-15354: ----------------------------------- [~dengziming], I took a look at this. I believe this is happening because when we are trying to find the first replica of a new partition, [here|https://github.com/apache/kafka/blob/trunk/metadata/src/main/java/org/apache/kafka/metadata/placement/StripedReplicaPlacer.java#L362], we set the index back to 0 when the epochs don't match [here|https://github.com/apache/kafka/blob/trunk/metadata/src/main/java/org/apache/kafka/metadata/placement/StripedReplicaPlacer.java#L190]. In the test case you supplied, when we are adding partition 2, the epoch known to the brokers in rack 1 is 1 but the new incoming epoch is 2. So, the index is reset back to 0. I think that's why in this round as well we see broker 1 being assigned the leader. WDYT? > Partition leader is not evenly distributed in kraft mode > -------------------------------------------------------- > > Key: KAFKA-15354 > URL: https://issues.apache.org/jira/browse/KAFKA-15354 > Project: Kafka > Issue Type: Bug > Reporter: Deng Ziming > Priority: Major > > In StripedReplicaPlacerTest, we can create a test below to reproduce this bug. > {code:java} > // code placeholder > @Test > public void testReplicaDistribution() { > MockRandom random = new MockRandom(); > StripedReplicaPlacer placer = new StripedReplicaPlacer(random); > TopicAssignment assignment = place(placer, 0, 4, (short) 2, Arrays.asList( > new UsableBroker(0, Optional.of("0"), false), > new UsableBroker(1, Optional.of("0"), false), > new UsableBroker(2, Optional.of("1"), false), > new UsableBroker(3, Optional.of("1"), false))); > System.out.println(assignment); > } {code} > In StripedReplicaPlacer, we only ensure leader are distributed evenly across > racks, but we didn't ensure leader are evenly distributed across nodes. in > the test above, we have 4 node: 1 2 3 4, and create 4 partitions but the > leaders areĀ 1 2 1 2. while in zk mode, this is ensured, see > https://cwiki.apache.org/confluence/display/KAFKA/KIP-36+Rack+aware+replica+assignment -- This message was sent by Atlassian Jira (v8.20.10#820010)