Hi Paolo, Thanks a lot for the KIP. This feature would be very helpful to let users recover their Kafka clusters. This a partial review as I wanted to give you some feedback as soon as possible.
JS1 > Furthermore, there is no safe recovery from majority loss. For example, if 2 > of 3 controllers are permanently gone, you cannot update the VotersRecord and > must re-bootstrap with data loss. If the user loses 2 out of 3 controllers, metadata loss is possible. Kafka cannot recover from metadata loss. For example, if the metadata loss includes the leader epoch or ISR/ELR, Kafka cannot recover from those cases without additional data loss. JS2 I am wondering if we should have a tool specific to these use cases instead of reusing the kafka-storage tool. I like etcd's CLI organization. They have etcdctl which communicates with an active cluster. They have etcdutl which recovers an inactive cluster. In our cases it would beneficial to have a tool specific to recovering an inactive cluster. How about naming it kafka-recovery? I will use the CLI name in the rest of my response but I am open to name suggestions. JS3 What do you think of including a section on how to use the tool? When we document this tool/feature, we can copy that section to the Kafka documentation. From my perspective this is what they need to do to use this tool. 1. Shut down all controllers. 2. Pick the controller that has the longest cluster metadata log. The controller with the longest log is guaranteed to have all of the committed data. They would need a command like "kafka-recovery metadata log-length (--metadata-log-dir|--config)". This command would print the log end epoch and offset so that the user can compare them with the other controllers. 3. On the controller with the longest cluster metadata log, generate the latest snapshot if one doesn't already exist. The user can backup this snapshot in case they incorrectly recover the snapshot. E.g. "kafka-recovery metadata generate-checkpoint (--metadata-log-dir|--config)". 4. Recover the controller's default endpoint or listener. I think we limit this functionality to recovering only the default controller listener. The default controller listener is the first listener in "controller.listener.names". This is the listener that Kafka uses for outgoing connections and RPCs to the controllers. E.g. "kafka-recovery metadata override-endpoint --endpoint 0@host:port --endpoint 1@host:port ... --config ...". The command would only override the endpoints specified. E.g. if there are 3 controllers but the user only overrides one endpoint, the tool will only fix that one endpoint. What are your thoughts? 5. Copy the generated checkpoint to all the controllers and brokers. Copying the generated checkpoint to all controllers and brokers is slightly inconvenient. The issue is that KRaft won't replicate this checkpoint if the replicas (controllers and brokers) have already replicated up to the leader's log start offset. As an alternative to step 5, they must run "kafka-recovery metadata override-endpoint --endpoint 0@host:port --endpoint 1@host:port ... --config ..." on all of the replicas. Running this command on all replicas is problematic because the voter set might differ across nodes due to dynamic voters/controllers. Thanks, -- -José
