Re: KRaft controller disaster recovery

José Armando García Sancio Tue, 08 Apr 2025 13:27:26 -0700

Hi Luke and Colin,

On Mon, Apr 7, 2025 at 10:29 PM Luke Chen <show...@gmail.com> wrote:
> That's why we were discussing if there's any way to "force" recover the
> scenario, even if it's possible to have data loss.


Yes. There is a way. They need to configure a controller cluster that
matches the voter set in the cluster metadata partition. That means a
controller cluster that matches the node ids, directory ids, and the
snapshot and log segments match with the consistent cluster metadata
partition. They can do that manually today. I think that Colin is
suggesting a tool to make this easier.

The user should understand that these manual operations are extremely
dangerous and can result in data loss in the cluster metadata
partition. A Kafka cluster cannot recover from loss of data in the
cluster metadata partition. For example, partition leader epochs can
decrease because of data loss in the cluster metadata partition and
Kafka brokers don't handle decreasing partition leader epochs.

If the user doesn't understand kraft's protocol to some degree, it is
unlikely that they can blindly follow some instruction and be
successful in their recovery.

I am hesitant to give users the impression that Kafka can tolerate and
recover from data loss in the cluster metadata partition.

What do you think?
-- 
-José

Re: KRaft controller disaster recovery

Reply via email to