Thanks a lot folks, this is really helpful. > I believe the limitation that this documentation is hinting at is the > motivation for KIP-996
I'll make sure to check out KIP-996 and the references linked there. Thanks for the summary as well, I really appreciate it. Cheers, Dani On Tue, Feb 6, 2024 at 6:52 PM Michael K. Edwards <m.k.edwa...@gmail.com> wrote: > > A 5-node quorum doesn't make a lot of sense in a setting where those nodes > are also Kafka brokers. When they're ZooKeeper voters, a quorum* of 5 > makes a lot of sense, because you can take an unscheduled voter failure > during a rolling-reboot scheduled maintenance without significant service > impact. You can also spread the ZK quorum across multiple AZs (or your > cloud's equivalent), which I would rarely recommend doing with Kafka. > > The trend in Kafka development and deployment is towards KRaft, and there > is probably no percentage in bucking that trend. Just don't expect it to > cover every "worst realistic case" scenario that a ZK-based deployment can. > > Scheduled maintenance on an (N+2 for read integrity, N+1 to stay writable) > system adds vulnerability, and that's just something you have to build into > your risk model. N+1 is good enough for finely partitioned data in any use > case that Kafka fits, because resilvering after a maintenance or a full > broker loss is highly parallel. N+1 is also acceptable for consumer group > coordinator metadata, as long as you tune for aggressive compaction; I > haven't looked at whether the coordinator code does a good job of > parallelizing metadata replay, but if it doesn't, there's no real > difficulty in fixing that. For global metadata that needs globally > serialized replay, which is what the controller metadata is, I was a lot > happier with N+2 to stay writable. But that's water under the bridge, and > I'm just a spectator. > > Regards, > - Michael > > > * I hate this misuse of the word "quorum", but what can one do? > > > On Tue, Feb 6, 2024, 8:51 AM Greg Harris <greg.har...@aiven.io.invalid> > wrote: > > > Hi Dani, > > > > I believe the limitation that this documentation is hinting at is the > > motivation for KIP-996 [1], and the notice in the documentation would > > be removed once KIP-996 lands. > > You can read the KIP for a brief explanation and link to a more > > in-depth explanation of the failure scenario. > > > > While a 3-node quorum would typically be less reliable or available > > than a 5-node quorum, it happens to be resistant to this failure mode > > which makes the additional controllers liabilities instead of assets. > > In the judgement of the maintainers at least, the risk of a network > > partition which could trigger unavailability in a 5-node quorum is > > higher than the risk of a 2-controller failure in a 3-node quorum, so > > 3-node quorums are recommended. > > You could do your own analysis and practical testing to make this > > tradeoff yourself in your network context. > > > > I hope this helps! > > Greg > > > > [1] https://cwiki.apache.org/confluence/display/KAFKA/KIP-996%3A+Pre-Vote > > > > On Tue, Feb 6, 2024 at 4:25 AM Daniel Saiz > > <daniel.s...@shopify.com.invalid> wrote: > > > > > > Hello, > > > > > > I would like to clarify a statement I found in the KRaft documentation, > > in > > > the deployment section [1]: > > > > > > > More than 3 controllers is not recommended in critical environments. In > > > the rare case of a partial network failure it is possible for the cluster > > > metadata quorum to become unavailable. This limitation will be addressed > > in > > > a future release of Kafka. > > > > > > I would like to clarify what it's meant by that sentence, as intuitively > > I > > > don't see why 3 replicas would be better than 5 (or more) for fault > > > tolerance. > > > What is the current limitation this is referring to? > > > > > > Thanks a lot. > > > > > > > > > Cheers, > > > > > > Dani > > > > > > [1] https://kafka.apache.org/36/documentation.html#kraft_deployment > >