Hi Pat, Yes, ZOOKEEPER-153 could help this case. The gist of the issue is reliable change notification with data. The linearizable read I had in mind alone might not solve this as it's missing the reliably capturing change notification part.
>> I'll also add that we haven't done any benchmarking in quite some time. I think this is a very good point. The existing public benchmarks are either targeted old version, or not optimally set up. This creates a gap between current scalability and performance of ZK and the existing (usually negative) public perception. With many improvements on scale / perf in last 2 years the status quo is very different now. On Fri, Aug 2, 2019 at 11:49 AM Patrick Hunt <[email protected]> wrote: > Michael I think you are describing subscribe - this? > https://issues.apache.org/jira/browse/ZOOKEEPER-153 > wasn't there some work done to keep tlogs around for a while? Or am I miss > remembering? (fb folks?) > > I'll also add that we haven't done any benchmarking in quite some time. It > would be interesting to collect a few of these use cases from the > community, esp downstreams, and evaluate performance, see if we can > address. > > Patrick > > On Fri, Aug 2, 2019 at 11:03 AM Michael Han <[email protected]> wrote: > > > Folks, > > > > Some of you might already see this. Comments? > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-500%3A+Replace+ZooKeeper+with+a+Self-Managed+Metadata+Quorum > > > > > > What caught my eyes are: > > > > *Worse still, although ZooKeeper is the store of record, the state in > > ZooKeeper often doesn't match the state that is held in memory in the > > controller. For example, when a partition leader changes its ISR in ZK, > > the controller will typically not learn about these changes for many > > seconds. There is no generic way for the controller to follow the > > ZooKeeper event log. Although the controller can set one-shot watches, > the > > number of watches is limited for performance reasons. When a watch > > triggers, it doesn't tell the controller the current state-- only that > the > > state has changed. By the time the controller re-reads the znode and > sets > > up a new watch, the state may have changed from what it was when the > watch > > originally fired. If there is no watch set, the controller may not learn > > about the change at all. In some cases, restarting the controller is the > > only way to resolve the discrepancy.* > > > > I've seen some similar zookeeper use cases that ended up like what's > > described here. How can ZooKeeper solve this? It seems to me that the > only > > solution is to provide linearizable read on watched operations. Thoughts? > > > > Michael. > > >
