Hi Kaushik, as Anton pointed out (thanks Anton for that), our Strimzi cluster operator has the entire logic for migrating clusters from ZooKeeper to KRaft completely automated. You have to deal with applying just a couple of annotations: to start the process and finalize it when in "dual-mode". Such approach is coming from Strimzi but if you are running your Kafka cluster on Kubernetes without it, of course, it's not applicable.
Thanks, Paolo. On Wed, 17 Jun 2026 at 12:13, Anton Agestam via dev <[email protected]> wrote: > Hi Kaushik, > > I've spent a good part of the last year building and operating automated > KRaft migrations for a large fleet of managed Kafka clusters with my team > at Aiven, so I'll share what I think are the reusable parts of that > experience. > > The short answer is that the automation is entirely feasible, and we run it > online with > no downtime. But the migration sequence documented — provision the > controller quorum in > migration mode, move the brokers through migration mode onto KRaft, then > finalize — is > not a sufficient guide for implementing this. It does tell you what to wait > for between > steps, but the signals it points to are operator-facing rather than > programmatic: to > know the metadata migration has completed, for example, it directs you to > watch for an > INFO log line on the active controller, "Completed migration of metadata > from Zookeeper > to KRaft", and to raise log verbosity to TRACE while the migration runs. > What it does > not describe is how to coordinate the rolling restarts and those waits > across the nodes > in code. Both are reasonable omissions for a runbook executed by a human > operator, but > not the best from a perspective of automating migrations. > > The parts that took the real work were these: > > Gating every step on the cluster being healthy first. A migration raises > risk across the > board, so before each transition we require the cluster to be in a boring, > stable state, > and we refuse to proceed while anything else is in flight — an in-progress > partition > reassignment, an unexpected node count, a controller quorum that has not > fully formed. > Waiting is always the safe default, and we implement alerts for when stages > take > unexpectedly long, such that a human operator can take a look. > > Coordinating the fan-out and fan-in. A side-car running for each node > carries out its > own reconfiguration and then confirms it actually took effect before > recording itself as > done. The confirmations come from signals the Kafka processes already > expose: a broker > reports whether it is in migration mode through the zkMigrationReady flag > in its > ApiVersions response, which the broker only sets once it has a valid > migration > configuration; the controller quorum reports its progress through the > ZkMigrationState > metric, which we wait to see settle into the dual-write migration state, > and we confirm > the quorum itself is formed and caught up by reading its voter set and > per-voter > replication lag from DescribeQuorum. A single elected node advances the > shared migration > state only once every node has confirmed the current step. That state lives > in a store > all the nodes agree on, and every advance is a compare-and-swap, so the > coordination > stays correct even if two nodes briefly believe they are in charge. > > | Stage | Inspected metric / response > | Expected value > | > > |-------------------------------|----------------------------------------------------------------------|-----------------------------------------------------------------------| > | Controller quorum provisioned | DescribeQuorum > | all expected voters present, not lagging > | > | Brokers in migration mode | broker `ApiVersions.zkMigrationReady`; > controller `ZkMigrationState` | `zkMigrationReady=true` on every broker; > `ZkMigrationState=MIGRATION` | > | Brokers migrated to KRaft | broker `ApiVersions.zkMigrationReady` > | `zkMigrationReady=false` on every broker > | > | Migration finalized | controller `ZkMigrationState` > | `ZkMigrationState=POST_MIGRATION` > | > > We keep a reversible window. Once every broker is running on KRaft while > the controllers > keep writing metadata back to ZooKeeper (dual-write phase), the migration > is fully > functional yet still reversible. We deliberately hold the cluster in this > phase for a > while before the irreversible finalize step, which gives a real opportunity > to observe > production workload and allow operators to roll back a cluster if needed. > > We treat topology change as a first-class concern. Nodes can be added or > replaced in the > middle of a migration, and that interacts with both the forward sequence > and the > rollback path in ways that are easy to get wrong. We treat topology changes > as supported > at every stage, but one recent case shows how subtle this is: a node > replacement during > the migration turns out to be incompatible with rolling back, because > formatting a > replacement node's storage for KRaft produces a meta.properties file that > cannot > afterwards be reverted to ZooKeeper mode. This was in general the most > tricky part to > get right, and we rely a lot on compare-and-swap guarantees from a shared > KV store for > correctness in our implementation. When node replacements happen during the > migration > sequence, in general we wait for that node replacement to complete in > whichever stage > the migration is, and continue only ones the cluster is fully stable again. > > For the Kubernetes-operator case you mention, we don't run that model > ourselves, so I'll > avoid prescribing specifics — though I believe Strimzi already supports > automated KRaft > migration. It is likely worth checking > out: https://strimzi.io/blog/2024/03/22/strimzi-kraft-migration/. > > One concrete contribution that may be valuable to the community is > extending the > documentation to describe the means already available for programmatically > verifying that each stage has completed — the broker and controller signals > above, > rather than observing log lines. > > It may be worth also noting that we use KIP-853 membership changes. We are > really happy > with that choice and find in general that it provides really great > stability to our operations. > Where we routinely have issues with operating ZK, we do not see that with > KRaft. > > I am glad to see this discussed, and I'm happy to share more about our > approach if it's > valuable to the community. I'm also planning a more detailed write-up of > our design and > will share it on this list when it's ready. > > BR, > Anton > > Den ons 17 juni 2026 kl 09:33 skrev Kaushik Srinivas (Nokia) via dev < > [email protected]>: > > > Hi Kafka team, > > With the recent changes to Apache Kafka movement towards kraft from zk > > involving migration, if there is a workable solution to automate this > > migration, is the community open to such proposals ? > > What is the community's view of such automation and sharing across the > > kafka community ? specifically for k8s deployments of kafka where the > > brokers are managed via workload k8s controllers. > > -Kaushik. > > > -- Paolo Patierno *Senior Principal Software Engineer @ IBM**CNCF Ambassador* Twitter : @ppatierno <http://twitter.com/ppatierno> Linkedin : paolopatierno <http://it.linkedin.com/in/paolopatierno> GitHub : ppatierno <https://github.com/ppatierno>
