Ryan Berdeen created KAFKA-1767:
-----------------------------------
Summary: /admin/reassign_partitions deleted before reassignment
completes
Key: KAFKA-1767
URL: https://issues.apache.org/jira/browse/KAFKA-1767
Project: Kafka
Issue Type: Bug
Components: controller
Affects Versions: 0.8.1.1
Reporter: Ryan Berdeen
Assignee: Neha Narkhede
https://github.com/apache/kafka/blob/0.8.1.1/core/src/main/scala/kafka/controller/KafkaController.scala#L477-L517
describes the process of reassigning partitions. Specifically,by the time
{{/admin/reassign_partitions}} is updated, the newly assigned replicas (RAR)
should be in sync, and the assigned replicas (AR) in ZooKeeper should be
updated:
{code}
4. Wait until all replicas in RAR are in sync with the leader.
...
10. Update AR in ZK with RAR.
11. Update the /admin/reassign_partitions path in ZK to remove this partition.
{code}
This worked in 0.8.1, but in 0.8.1.1 we observe {{/admin/reassign_partitions}}
being removed before step 4 has completed.
For example, if we have AR [1,2] and then put [3,4] in
{{/admin/reassign_partitions}}, the cluster will end up with AR [1,2,3,4] and
ISR [1,2] when the key is removed. Eventually, the AR will be updated to [3,4].
This means that the {{kafka-reassign-partitions.sh}} tool will accept a new
batch of reassignments before the current reassignments have finished, and our
own tool that feeds in reassignments in small batches (see KAFKA-1677) can't
rely on this key to detect active reassignments.
Although we haven't observed this, it seems likely that if a controller
resignation happens, the new controller won't know that a reassignment is in
progress, and the AR will never be updated to the RAR.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)