Adi, Thanks for the write-up. Here are my thoughts:
I think you are suggesting a way of automating resurrecting a topic’s replication factor in the presence of a specific scenario: in the event of permanent broker failures. I agree that the partition reassignment mechanism should be used to add replicas when they are lost to permanent broker failures. But I think the KIP probably chews off more than we can digest. Before we automate detection of permanent broker failures and have the controller mitigate through automatic data balancing, I’d like to point out that our current difficulty is not that but the ability to generate a workable partition assignment for rebalancing data in a cluster. There are 2 problems with partition rebalancing today: 1. Lack of replica throttling for balancing data: In the absence of replica throttling, even if you come up with an assignment that might be workable, it isn’t practical to kick it off without worrying about bringing the entire cluster down. I don’t think the hack of moving partitions in batches is effective as it at-best a best guess. 2. Lack of support for policies in the rebalance tool that automatically generate a workable partition assignment: There is no easy way to generate a partition reassignment JSON file. An example of a policy is “end up with an equal number of partitions on every broker while minimizing data movement”. There might be other policies that might make sense, we’d have to experiment. Broadly speaking, the data balancing problem is comprised of 3 parts: 1. Trigger: An event that triggers data balancing to take place. KIP-46 suggests a specific trigger and that is permanent broker failure. But there might be several other events that might make sense — Cluster expansion, decommissioning brokers, data imbalance 2. Policy: Given a set of constraints, generate a target partition assignment that can be executed when triggered. 3. Mechanism: Given a partition assignment, make the state changes and actually move the data until the target assignment is achieved. Currently, the trigger is manual through the rebalance tool, there is no support for any viable policy today and we have a built-in mechanism that, given a policy and upon a trigger, moves data in a cluster but does not support throttling. Given that both the policy and the throttling improvement to the mechanism are hard problems and given our past experience of operationalizing partition reassignment (required months of testing before we got it right), I strongly recommend attacking this problem in stages. I think a more practical approach would be to add the concept of pluggable policies in the rebalance tool, implement a practical policy that generates a workable partition assignment upon triggering the tool and improve the mechanism to support throttling so that a given policy can succeed without manual intervention. If we solved these problems first, the rebalance tool would be much more accessible to Kafka users and operators. Assuming that we do this, the problem that KIP-46 aims to solve becomes much easier. You can separate the detection of permanent broker failures (trigger) from the mitigation (above-mentioned improvements to data balancing). The latter will be a native capability in Kafka. Detecting permanent hardware failures is much easily done via an external script that uses a simple health check. (Part 1 of KIP-46). I agree that it will be great to *eventually* be able to fully automate both the trigger as well as the policies while also improving the mechanism. But I’m highly skeptical of big-bang approaches that go from a completely manual and cumbersome process to a fully automated one, especially when that involves large-scale data movement in a running cluster. Once we stabilize these changes and feel confident that they work, we can push the policy into the controller and have it automatically be triggered based on different events. Thanks, Neha On Tue, Feb 2, 2016 at 6:13 PM, Aditya Auradkar < aaurad...@linkedin.com.invalid> wrote: > Hey everyone, > > I just created a kip to discuss automated replica reassignment when we lose > a broker in the cluster. > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-46%3A+Self+Healing+Kafka > > Any feedback is welcome. > > Thanks, > Aditya > -- Thanks, Neha