[ https://issues.apache.org/jira/browse/KAFKA-8663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
GEORGE LI updated KAFKA-8663: ----------------------------- Description: >From my observation/experience during reassignment, the partition assignment >replica ordering gets changed. because it's OAR + RAR (original replicas + >reassignment replicas) set union. However, it seems like the preferred leaders changed during the reassignments. Normally if there is no cluster preferred leader election, the leader is still the old leader. But if during the reassignments, there is a leader election, the leadership changes. This caused some side effects. Let's look at this example. {code} Topic:georgeli_test PartitionCount:8 ReplicationFactor:3 Configs: Topic: georgeli_test Partition: 0 Leader: 1026 Replicas: 1026,1028,1025 Isr: 1026,1028,1025 {code} reassignment (1026,1028,1025) => (1027,1025,1028) {code} Topic:georgeli_test PartitionCount:8 ReplicationFactor:4 Configs:leader.replication.throttled.replicas=0:1026,0:1028,0:1025,follower.replication.throttled.replicas=0:1027 Topic: georgeli_test Partition: 0 Leader: 1026 Replicas: 1027,1025,1028,1026 Isr: 1026,1028,1025 {code} Notice the above: Leader remains 1026. but Replicas: 1027,1025,1028,1026. If we run preferred leader election, it will try 1027 first, then 1025. After 1027 is in ISR, then the final assignment will be (1027,1025,1028). My proposal for a minor improvement is to keep the original ordering replicas during the reassignment (could be long for big topic/partitions). and after all replicas in ISR, then finally set the partition assignment to New reassignment. {code} val newAndOldReplicas = (reassignedPartitionContext.newReplicas ++ controllerContext.partitionReplicaAssignment(topicPartition)).toSet //1. Update AR in ZK with OAR + RAR. updateAssignedReplicasForPartition(topicPartition, newAndOldReplicas.toSeq) {code} above code changed to below to keep the original ordering first during reassignment: {code} val newAndOldReplicas = (controllerContext.partitionReplicaAssignment(topicPartition) ++ reassignedPartitionContext.newReplicas).toSet {code} was: >From my observation/experience during reassignment, the partition assignment >replica ordering gets changed. because it's OAR + RAR (original replicas + >reassignment replicas) set union. However, it seems like the preferred leaders changed during the reassignments. Normally if there is no cluster preferred leader election, the leader is still the old leader. But if during the reassignments, there is a leader election, the leadership changes. This caused some side effects. Let's look at this example. {code} Topic:georgeli_test PartitionCount:8 ReplicationFactor:3 Configs: Topic: georgeli_test Partition: 0 Leader: 1026 Replicas: 1026,1028,1025 Isr: 1026,1028,1025 {code} reassignment (1026,1028,1025) => (1027,1025,1028) {code} Topic:georgeli_test PartitionCount:8 ReplicationFactor:4 Configs:leader.replication.throttled.replicas=0:1026,0:1028,0:1025,follower.replication.throttled.replicas=0:1027 Topic: georgeli_test Partition: 0 Leader: 1026 Replicas: 1027,1025,1028,1026 Isr: 1026,1028,1025 {code} Notice the above: Leader remains 1026. but Replicas: 1027,1025,1028,1026. If we run preferred leader election, it will try 1027 first, then 1025. After 1027 is in ISR, then the final assignment will be (1027,1025,1028). My proposal for a minor improvement is to keep the original ordering replicas during the reassignment (could be long for big topic/partitions). and after all replicas in ISR, then finally set the partition assignment to New reassignment. {code} val newAndOldReplicas = (reassignedPartitionContext.newReplicas ++ controllerContext.partitionReplicaAssignment(topicPartition)).toSet //1. Update AR in ZK with OAR + RAR. updateAssignedReplicasForPartition(topicPartition, newAndOldReplicas.toSeq) {code} above code changed to below to keep the original ordering during reassignment: {code} val newAndOldReplicas = (controllerContext.partitionReplicaAssignment(topicPartition) ++ reassignedPartitionContext.newReplicas).toSet { code} > partition assignment would be better original_assignment + new_reassignment > during reassignments > ------------------------------------------------------------------------------------------------ > > Key: KAFKA-8663 > URL: https://issues.apache.org/jira/browse/KAFKA-8663 > Project: Kafka > Issue Type: Improvement > Components: controller, core > Affects Versions: 1.1.1, 2.3.0 > Reporter: GEORGE LI > Priority: Minor > > From my observation/experience during reassignment, the partition assignment > replica ordering gets changed. because it's OAR + RAR (original replicas > + reassignment replicas) set union. > However, it seems like the preferred leaders changed during the > reassignments. Normally if there is no cluster preferred leader election, > the leader is still the old leader. But if during the reassignments, there > is a leader election, the leadership changes. This caused some side > effects. Let's look at this example. > {code} > Topic:georgeli_test PartitionCount:8 ReplicationFactor:3 Configs: > Topic: georgeli_test Partition: 0 Leader: 1026 Replicas: > 1026,1028,1025 Isr: 1026,1028,1025 > {code} > reassignment (1026,1028,1025) => (1027,1025,1028) > {code} > Topic:georgeli_test PartitionCount:8 ReplicationFactor:4 > Configs:leader.replication.throttled.replicas=0:1026,0:1028,0:1025,follower.replication.throttled.replicas=0:1027 > Topic: georgeli_test Partition: 0 Leader: 1026 Replicas: > 1027,1025,1028,1026 Isr: 1026,1028,1025 > {code} > Notice the above: Leader remains 1026. but Replicas: 1027,1025,1028,1026. > If we run preferred leader election, it will try 1027 first, then 1025. > After 1027 is in ISR, then the final assignment will be (1027,1025,1028). > > My proposal for a minor improvement is to keep the original ordering replicas > during the reassignment (could be long for big topic/partitions). and after > all replicas in ISR, then finally set the partition assignment to New > reassignment. > {code} > val newAndOldReplicas = (reassignedPartitionContext.newReplicas ++ > controllerContext.partitionReplicaAssignment(topicPartition)).toSet > //1. Update AR in ZK with OAR + RAR. > updateAssignedReplicasForPartition(topicPartition, > newAndOldReplicas.toSeq) > {code} > above code changed to below to keep the original ordering first during > reassignment: > {code} > val newAndOldReplicas = > (controllerContext.partitionReplicaAssignment(topicPartition) ++ > reassignedPartitionContext.newReplicas).toSet > {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016)