kaushik srinivas created KAFKA-16370:
----------------------------------------
Summary: offline rollback procedure from kraft mode to zookeeper
mode.
Key: KAFKA-16370
URL: https://issues.apache.org/jira/browse/KAFKA-16370
Project: Kafka
Issue Type: Improvement
Reporter: kaushik srinivas
>From the KIP,
>[https://cwiki.apache.org/confluence/display/KAFKA/KIP-866+ZooKeeper+to+KRaft+Migration,]
h2. Finalizing the Migration
Once the cluster has been fully upgraded to KRaft mode, the controller will
still be running in migration mode and making dual writes to KRaft and ZK.
Since the data in ZK is still consistent with that of the KRaft metadata log,
it is still possible to revert back to ZK.
*_The time that the cluster is running all KRaft brokers/controllers, but still
running in migration mode, is effectively unbounded._*
Once the operator has decided to commit to KRaft mode, the final step is to
restart the controller quorum and take it out of migration mode by setting
_zookeeper.metadata.migration.enable_ to "false" (or unsetting it). The active
controller will only finalize the migration once it detects that all members of
the quorum have signaled that they are finalizing the migration (again, using
the tagged field in ApiVersionsResponse). Once the controller leaves migration
mode, it will write a ZkMigrationStateRecord to the log and no longer perform
writes to ZK. It will also disable its special handling of ZK RPCs.
*At this point, the cluster is fully migrated and is running in KRaft mode. A
rollback to ZK is still possible after finalizing the migration, but it must be
done offline and it will cause metadata loss (which can also cause partition
data loss).*
Trying out the same in a kafka cluster which is migrated from zookeeper into
kraft mode. We observe the rollback is possible by deleting the "/controller"
node in the zookeeper before the rollback from kraft mode to zookeeper is done.
The above snippet indicates that the rollback from kraft to zk after migration
is finalized is still possible in offline method. Is there any already known
steps to be done as part of this offline method of rollback ?
>From our experience, we currently know of the step "deletion of /controller
>node in zookeeper to force zookeper based brokers to be elected as new
>controller after the rollback is done". Are there any additional steps/actions
>apart from this ?
--
This message was sent by Atlassian Jira
(v8.20.10#820010)