Hi Dani,

I believe the limitation that this documentation is hinting at is the
motivation for KIP-996 [1], and the notice in the documentation would
be removed once KIP-996 lands.
You can read the KIP for a brief explanation and link to a more
in-depth explanation of the failure scenario.

While a 3-node quorum would typically be less reliable or available
than a 5-node quorum, it happens to be resistant to this failure mode
which makes the additional controllers liabilities instead of assets.
In the judgement of the maintainers at least, the risk of a network
partition which could trigger unavailability in a 5-node quorum is
higher than the risk of a 2-controller failure in a 3-node quorum, so
3-node quorums are recommended.
You could do your own analysis and practical testing to make this
tradeoff yourself in your network context.

I hope this helps!
Greg

[1] https://cwiki.apache.org/confluence/display/KAFKA/KIP-996%3A+Pre-Vote

On Tue, Feb 6, 2024 at 4:25 AM Daniel Saiz
<daniel.s...@shopify.com.invalid> wrote:
>
> Hello,
>
> I would like to clarify a statement I found in the KRaft documentation, in
> the deployment section [1]:
>
> > More than 3 controllers is not recommended in critical environments. In
> the rare case of a partial network failure it is possible for the cluster
> metadata quorum to become unavailable. This limitation will be addressed in
> a future release of Kafka.
>
> I would like to clarify what it's meant by that sentence, as intuitively I
> don't see why 3 replicas would be better than 5 (or more) for fault
> tolerance.
> What is the current limitation this is referring to?
>
> Thanks a lot.
>
>
> Cheers,
>
> Dani
>
> [1] https://kafka.apache.org/36/documentation.html#kraft_deployment

Reply via email to