Re: KRaft controller disaster recovery

Colin McCabe Tue, 08 Apr 2025 12:54:58 -0700

On Mon, Apr 7, 2025, at 19:29, Luke Chen wrote:
> Hi Jose and Colin,
>
> Thanks for your explanation!
>
> Yes, we all agree that 3 node quorum can only tolerate 1 node down.
> We just want to discuss, "what if" 2 out of 3 nodes are down at the same
> time, what can we do?
> Currently, the result is that the quorum will never form and all the kafka
> cluster is basically unavailable.
> That's why we were discussing if there's any way to "force" recover the
> scenario, even if it's possible to have data loss.
>


Hi Luke,

Just for the benefit of those reading this... in a real-world scenario like 
this, I would expect the admin to copy the metadata to a second controller node 
and bring that node up, to restore the quorum.

Maybe I'm missing some constraint here like "we can't reuse the same hostname / 
port ever again" which makes this impossible. I think that constraint would be 
extremely rare in practice. For example, if you use kubernetes, you control DNS 
yourself, so you can always give a new node the same DNS address again if 
needed. However, it is worth considering these unusual scenarios.

> > Perhaps something like "format with existing metadata"? If we did
> > something like that, we should probably make it a separate tool from the
> > formatting tool, and explicitly make it interactive (requires you to type
> > "YES" on the console or something), since I DON'T want the folks making
> > docker images and so on to do this.
>
> Sounds like a good idea!
>

It would be good to understand the scenario we're trying to solve a bit more. 
I'm thinking it's something like "no controller nodes exist, and we want to 
stand up a new quorum with some existing metadata"?

best,
Colin

> Thanks.
> Luke
>
>
>
> On Tue, Apr 8, 2025 at 5:49 AM Colin McCabe <[email protected]> wrote:
>
>> Hi José,
>>
>> I think you make a valid point that our guarantees here are not actually
>> different from zookeeper. In both systems, if you lose quorum, you will
>> probably lose some data. Of course, how much data you lose depends on luck.
>> If the last node standing was the active controller / zookeeper, then you
>> got lucky.
>>
>> This is why a lot of people run with 5 or 7 node quorums in both systems.
>> The redundnancy is useful.
>>
>> I do think it would be nice to document some way of "forcing" the quorum
>> into a specific configuration for data loss scenarios like this. This could
>> also be used in the case where we lost 100% of the controllers. The brokers
>> have a metadata snapshot and metadata log, so in an emergency you could
>> grab the metadata from there.
>>
>> Perhaps something like "format with existing metadata"? If we did
>> something like that, we should probably make it a separate tool from the
>> formatting tool, and explicitly make it interactive (requires you to type
>> "YES" on the console or something), since I DON'T want the folks making
>> docker images and so on to do this.
>>
>> best,
>> Colin
>>
>>
>> On Mon, Apr 7, 2025, at 14:26, José Armando García Sancio wrote:
>> > Thanks Luke.
>> >
>> > On Thu, Apr 3, 2025 at 7:14 AM Luke Chen <[email protected]> wrote:
>> >> In addition to the approaches you provided, maybe we can have a way to
>> >> "force" KRaft to honor "controller.quorum.voters" config, instead of
>> >> "controller.quorum.bootstrap.servers", even it's in kraft.version 1.
>> >
>> > Small clarification. In KIP-595, controller.quorum.voters was playing
>> > two roles. 1) the set voters used by KRaft during HWM calculation and
>> > leader election. 2) the set of endpoints used by observers (brokers)
>> > to discover the leader (active controller).
>> >
>> > In KIP-853, I split that functionality. The set of voters was moved to
>> > control records (VotersRecord) in the cluster metadata partition. The
>> > bootstrap endpoints used by observer/brokers to discover the leader
>> > were moved to the controller.quorum.bootstrap.servers property.
>> >
>> > I say that because Kafka supports using both configurations at the
>> > same time. This is useful when upgrading from kraft.version 0 to
>> > kraft.version 1. The upgrade process is roughly as follow:
>> > 1. Add the controller.quorum.bootstrap.servers to all of the nodes
>> > (controllers and brokers).
>> > 2. Upgrade the kraft.version from 0 to 1.
>> > 3. Monitor the "ignored-static-voters" metrics and remove the
>> > controller.quorum.voters when the metric is 1.
>> >
>> > Thanks,
>> > --
>> > -José
>>

Re: KRaft controller disaster recovery

Reply via email to