KIP-500: Replace ZooKeeper with a Self-Managed Metadata Quorum

Michael Han Fri, 02 Aug 2019 11:04:09 -0700

Folks,

Some of you might already see this. Comments?
https://cwiki.apache.org/confluence/display/KAFKA/KIP-500%3A+Replace+ZooKeeper+with+a+Self-Managed+Metadata+Quorum



What caught my eyes are:

*Worse still, although ZooKeeper is the store of record, the state in
ZooKeeper often doesn't match the state that is held in memory in the
controller.  For example, when a partition leader changes its ISR in ZK,
the controller will typically not learn about these changes for many
seconds.  There is no generic way for the controller to follow the
ZooKeeper event log.  Although the controller can set one-shot watches, the
number of watches is limited for performance reasons.  When a watch
triggers, it doesn't tell the controller the current state-- only that the
state has changed.  By the time the controller re-reads the znode and sets
up a new watch, the state may have changed from what it was when the watch
originally fired.  If there is no watch set, the controller may not learn
about the change at all.  In some cases, restarting the controller is the
only way to resolve the discrepancy.*

I've seen some similar zookeeper use cases that ended up like what's
described here. How can ZooKeeper solve this? It seems to me that the only
solution is to provide linearizable read on watched operations. Thoughts?

Michael.

KIP-500: Replace ZooKeeper with a Self-Managed Metadata Quorum

Reply via email to