Hi Dani, I believe the limitation that this documentation is hinting at is the motivation for KIP-996 [1], and the notice in the documentation would be removed once KIP-996 lands. You can read the KIP for a brief explanation and link to a more in-depth explanation of the failure scenario.
While a 3-node quorum would typically be less reliable or available than a 5-node quorum, it happens to be resistant to this failure mode which makes the additional controllers liabilities instead of assets. In the judgement of the maintainers at least, the risk of a network partition which could trigger unavailability in a 5-node quorum is higher than the risk of a 2-controller failure in a 3-node quorum, so 3-node quorums are recommended. You could do your own analysis and practical testing to make this tradeoff yourself in your network context. I hope this helps! Greg [1] https://cwiki.apache.org/confluence/display/KAFKA/KIP-996%3A+Pre-Vote On Tue, Feb 6, 2024 at 4:25 AM Daniel Saiz <daniel.s...@shopify.com.invalid> wrote: > > Hello, > > I would like to clarify a statement I found in the KRaft documentation, in > the deployment section [1]: > > > More than 3 controllers is not recommended in critical environments. In > the rare case of a partial network failure it is possible for the cluster > metadata quorum to become unavailable. This limitation will be addressed in > a future release of Kafka. > > I would like to clarify what it's meant by that sentence, as intuitively I > don't see why 3 replicas would be better than 5 (or more) for fault > tolerance. > What is the current limitation this is referring to? > > Thanks a lot. > > > Cheers, > > Dani > > [1] https://kafka.apache.org/36/documentation.html#kraft_deployment