Re: [DISCUSS] KIP-1347: Overriding voter set on storage formatting

José Armando García Sancio via dev Tue, 19 May 2026 02:36:09 -0700

Hi Paolo,

Thanks a lot for the KIP. This feature would be very helpful to let
users recover their Kafka clusters. This a partial review as I wanted
to give you some feedback as soon as possible.


JS1
> Furthermore, there is no safe recovery from majority loss. For example, if 2 
> of 3 controllers are permanently gone, you cannot update the VotersRecord and 
> must re-bootstrap with data loss.

If the user loses 2 out of 3 controllers, metadata loss is possible.
Kafka cannot recover from metadata loss. For example, if the metadata
loss includes the leader epoch or ISR/ELR, Kafka cannot recover from
those cases without additional data loss.

JS2
I am wondering if we should have a tool specific to these use cases
instead of reusing the kafka-storage tool. I like etcd's CLI
organization. They have etcdctl which communicates with an active
cluster. They have etcdutl which recovers an inactive cluster. In our
cases it would beneficial to have a tool specific to recovering an
inactive cluster. How about naming it kafka-recovery? I will use the
CLI name in the rest of my response but I am open to name suggestions.

JS3
What do you think of including a section on how to use the tool? When
we document this tool/feature, we can copy that section to the Kafka
documentation. From my perspective this is what they need to do to use
this tool.
1. Shut down all controllers.
2. Pick the controller that has the longest cluster metadata log. The
controller with the longest log is guaranteed to have all of the
committed data. They would need a command like "kafka-recovery
metadata log-length (--metadata-log-dir|--config)". This command would
print the log end epoch and offset so that the user can compare them
with the other controllers.
3. On the controller with the longest cluster metadata log, generate
the latest snapshot if one doesn't already exist. The user can backup
this snapshot in case they incorrectly recover the snapshot. E.g.
"kafka-recovery metadata generate-checkpoint
(--metadata-log-dir|--config)".
4. Recover the controller's default endpoint or listener. I think we
limit this functionality to recovering only the default controller
listener. The default controller listener is the first listener in
"controller.listener.names". This is the listener that Kafka uses for
outgoing connections and RPCs to the controllers. E.g. "kafka-recovery
metadata override-endpoint --endpoint 0@host:port --endpoint
1@host:port ... --config ...". The command would only override the
endpoints specified. E.g. if there are 3 controllers but the user only
overrides one endpoint, the tool will only fix that one endpoint. What
are your thoughts?
5. Copy the generated checkpoint to all the controllers and brokers.
Copying the generated checkpoint to all controllers and brokers is
slightly inconvenient. The issue is that KRaft won't replicate this
checkpoint if the replicas (controllers and brokers) have already
replicated up to the leader's log start offset.

As an alternative to step 5, they must run "kafka-recovery metadata
override-endpoint --endpoint 0@host:port --endpoint 1@host:port ...
--config ..." on all of the replicas. Running this command on all
replicas is problematic because the voter set might differ across
nodes due to dynamic voters/controllers.

Thanks,
--
-José

Re: [DISCUSS] KIP-1347: Overriding voter set on storage formatting

Reply via email to