[ 
https://issues.apache.org/jira/browse/KAFKA-15354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17755805#comment-17755805
 ] 

Sagar Rao commented on KAFKA-15354:
-----------------------------------

[~dengziming], I took a look at this. I believe this is happening because when 
we are trying to find the first replica of a new partition, 
[here|https://github.com/apache/kafka/blob/trunk/metadata/src/main/java/org/apache/kafka/metadata/placement/StripedReplicaPlacer.java#L362],
 we set the index back to 0 when the epochs don't match 
[here|https://github.com/apache/kafka/blob/trunk/metadata/src/main/java/org/apache/kafka/metadata/placement/StripedReplicaPlacer.java#L190].

In the test case you supplied, when we are adding partition 2, the epoch known 
to the brokers in rack 1 is 1 but the new incoming epoch is 2. So, the index is 
reset back to 0. I think that's why in this round as well we see broker 1 being 
assigned the leader. WDYT?

> Partition leader is not evenly distributed in kraft mode
> --------------------------------------------------------
>
>                 Key: KAFKA-15354
>                 URL: https://issues.apache.org/jira/browse/KAFKA-15354
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Deng Ziming
>            Priority: Major
>
> In StripedReplicaPlacerTest, we can create a test below to reproduce this bug.
> {code:java}
> // code placeholder
> @Test
> public void testReplicaDistribution() {
>     MockRandom random = new MockRandom();
>     StripedReplicaPlacer placer = new StripedReplicaPlacer(random);
>     TopicAssignment assignment = place(placer, 0, 4, (short) 2, Arrays.asList(
>             new UsableBroker(0, Optional.of("0"), false),
>             new UsableBroker(1, Optional.of("0"), false),
>             new UsableBroker(2, Optional.of("1"), false),
>             new UsableBroker(3, Optional.of("1"), false)));
>     System.out.println(assignment);
> } {code}
> In StripedReplicaPlacer, we only ensure leader are distributed evenly across 
> racks, but we didn't ensure leader are evenly distributed across nodes. in 
> the test above, we have 4 node: 1 2 3 4, and create 4 partitions but the 
> leaders areĀ  1 2 1 2. while in zk mode, this is ensured, see 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-36+Rack+aware+replica+assignment



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to