[
https://issues.apache.org/jira/browse/KAFKA-1631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14148680#comment-14148680
]
Ewen Cheslack-Postava commented on KAFKA-1631:
----------------------------------------------
Right. Unfortunately most of the system isn't aware of the large scale change
(reassign old set -> new set), only of each intermediate state (old set -> old
set + new set -> new set). As it stands, the UnderReplicatedPartitions are
computed by Partition class, which is created by ReplicaManager. But the
high-level reassignment is managed by KafkaController, and looks like the only
place the necessary state is maintained. I think getting the semantics you want
may require a much more substantial change since each partition leader will
need to know about the partition reassignment rather than just the controller.
On the other hand, while I think it's less than ideal, the current behavior
could certainly be argued to be reasonable -- i.e. that reassignment is not
"natively" supported, it's just a higher-level operation you can build up. In
this case, the intermediate step is expected, and the temporary reporting of
under-replication would make sense since for a time the desired replication of
(old set + new set) has not been achieved.
> ReplicationFactor and under-replicated partitions incorrect during
> reassignment
> -------------------------------------------------------------------------------
>
> Key: KAFKA-1631
> URL: https://issues.apache.org/jira/browse/KAFKA-1631
> Project: Kafka
> Issue Type: Bug
> Affects Versions: 0.8.1.1
> Reporter: Ryan Berdeen
> Labels: newbie
> Attachments: KAFKA-1631-v1.patch
>
>
> We have a topic with a replication factor of 3. We monitor
> UnderReplicatedPartitions as recommended by the documentation.
> During a partition reassignment, partitions being reassigned are reported as
> under-replicated. Running a describe shows:
> {code}
> Topic:activity-wal-1 PartitionCount:15 ReplicationFactor:5
> Configs:
> Topic: activity-wal-1 Partition: 0 Leader: 14 Replicas:
> 14,13,12,11,15 Isr: 14,12,11,13
> Topic: activity-wal-1 Partition: 1 Leader: 14 Replicas:
> 15,14,11 Isr: 14,11
> Topic: activity-wal-1 Partition: 2 Leader: 11 Replicas:
> 11,15,12 Isr: 12,11,15
> ...
> {code}
> It looks like the displayed replication factor, 5, is simply the number of
> replicas listed for the first partition, which includes both brokers in the
> current list and those onto which the partition is being reassigned.
> Partition 0 is also included in the list when using the
> `--under-replicated-partitions` option, even though it is replicated to more
> partitions than the true replication factor.
> During a reassignment, the under-replicated partitions metric is not usable,
> meaning that actual under-replicated partitions can go unnoticed.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)