Hi all,

I think Jordan is correct about this.

> In the case of reconnection, is it at least guaranteed you'll get some
kind of client notification about the connection state, so you can
reset any application state that relies on watchers being notified?
Like, you may not see a node changed during the disconnected state,
but will you at least get a connected notification from the persistent
watcher? Or do you have to rely on the connection status watcher set
when the client was created to see those?

It is guaranteed that persistent watchers will get state notifications
in the same channel(or callback) as node change events. There is a
test for this.[1]

> The problem is that missing notifications seem only being triggered for
standard watches but not for persistent watches when reconnecting.

This is misleading. What is sent to the client are not "missing
notifications", but simply the last state, so all intermediate changes
are lost. Jardan has pointed out this. This is what the doc states:

> There is one case where a watch may be missed: a watch for the existence of a 
> znode not yet created will be missed if the znode is created and deleted 
> while disconnected.

Basically, we are firing node change events based on `DataTree`(a.k.a.
snapshot) but not log entries.

In the case of building a cache, I recommend rebuilding the cache
after reconnection, it is the safest option from my point of view. [2]
This is also Apache Curator's handling of disconnection.

[1]: 
https://github.com/apache/zookeeper/blob/3d6c0d1164dc9ec96a02de383e410b1b0ef64565/zookeeper-server/src/test/java/org/apache/zookeeper/test/PersistentRecursiveWatcherTest.java#L151-L161
[2]: https://github.com/apache/zookeeper/pull/1950#issuecomment-1557685392

Best,
Kezhu Wang



On Sat, Jul 26, 2025 at 10:26 AM Kezhu Wang <kez...@gmail.com> wrote:
>
> Hi all,
>
> There is a jira issue:
> https://issues.apache.org/jira/browse/ZOOKEEPER-4698, it has links to
> more context.
>
> Best,
> Kezhu Wang
>
> On Sat, Jul 26, 2025 at 5:06 AM Jordan Zimmerman
> <jor...@jordanzimmerman.com> wrote:
> >
> > Here's a summary:
> >
> > On reconnect, watches are reset. For Data watches, if the node no longer 
> > exists, the watch will get NodeDeleted. If the node's zxId is different, 
> > the watch will get NodeDataChanged. Exist and child nodes have similar 
> > handling. Persistent watches, on the other hand, are merely reset.
> >
> > I no longer remember why we didn't mimic this for Persistent watches. I 
> > guess it can be argued that it isn't necessary or that it could result if a 
> > _lot_ of persistent watch calls. Maybe the right thing to do is to just 
> > document the difference and leave it as it's been this way for years.
> >
> > -Jordan
> >
> > > On Jul 25, 2025, at 9:58 PM, Keith Turner <ktur...@apache.org> wrote:
> > >
> > >
> > >
> > > On 2025/07/25 19:23:41 Jordan Zimmerman wrote:
> > >> Hi,
> > >>
> > >> I took a look at the code (which I haven't looked at in 5 or more 
> > >> years). It looks like the reconnection behavior _is_ different. 
> > >> Persistent watches will miss some events that other watches are getting. 
> > >> This is indeed a very long-standing bug.
> > >
> > > What events are missed for persistent recursive watchers that normal 
> > > watcher see?
> > >
> > >>
> > >> I'd be willing to work on this, but there's likely devs who are more 
> > >> familiar with the code now who can do it.
> > >>
> > >> -JZ
> > >>
> > >>> On Jul 25, 2025, at 8:06 PM, Jordan Zimmerman 
> > >>> <jor...@jordanzimmerman.com> wrote:
> > >>>
> > >>> Hi,
> > >>>
> > >>> Persistent watches are the same watch as every other watch. It all goes 
> > >>> through the same code. Let's look at the doc:
> > >>>
> > >>>> Because standard watches are one time triggers and there is latency 
> > >>>> between getting the event and sending a new request to
> > >>>> get a watch you cannot reliably see every change that happens to a 
> > >>>> node in ZooKeeper. Be prepared to handle the case where
> > >>>> the znode changes multiple times between getting the event and setting 
> > >>>> the watch again. (You may not care, but at least realize it may 
> > >>>> happen.)
> > >>>
> > >>> ZooKeeper does not keep any kind of queue of events. You cannot count 
> > >>> on seeing every event in ZooKeeper. Watchers are triggered as events 
> > >>> happen.
> > >>> Again, it's been a very long time since I've looked at the code but 
> > >>> this is my memory of how it works. When I wrote Persistent watches, I 
> > >>> used all
> > >>> the existing watch code. A Persistent watch is the exact same code path 
> > >>> as all other watches. They only difference is that they don't get 
> > >>> deleted after
> > >>> firing. Also, recursive watches trigger for child nodes being watched. 
> > >>> But, again, same code path.
> > >>>
> > >>> I hope this helps.
> > >>>
> > >>> -JZ
> > >>>
> > >>>
> > >>>> On Jul 25, 2025, at 7:30 PM, Li Wang <li4w...@gmail.com> wrote:
> > >>>>
> > >>>> Thanks for the input, Jordan.
> > >>>>
> > >>>> My understanding is that the standard watches do but persistent watches
> > >>>> don't. Not sure if I miss anything or if this is a bug. Looking 
> > >>>> forward to
> > >>>> any feedback/input on this.
> > >>>>
> > >>>> 1.  We have the following in the standard watch section of Zookeeper
> > >>>> documentation and it looks like missing notifications are triggered.
> > >>>>
> > >>>> When a client reconnects, any previously registered watches will be
> > >>>>> reregistered and triggered if needed.
> > >>>>
> > >>>>
> > >>>>
> > >>>> https://zookeeper.apache.org/doc/r3.9.3/zookeeperProgrammers.html#sc_WatchSemantics
> > >>>>
> > >>>>
> > >>>> 2. In the code base, Zookeeper client library maintains lastZXid in 
> > >>>> memory
> > >>>> and sends it to the server when resetting watches upon reconnection. 
> > >>>> The
> > >>>> server detects if any missing notifications need to be triggered based 
> > >>>> on
> > >>>> the lastZxid.
> > >>>>
> > >>>> https://github.com/apache/zookeeper/blob/master/zookeeper-server/src/main/java/org/apache/zookeeper/ClientCnxn.java#L1040-L1041
> > >>>> https://github.com/apache/zookeeper/blob/master/zookeeper-server/src/main/java/org/apache/zookeeper/server/DataTree.java#L1497
> > >>>>
> > >>>> 3. The problem is that missing notifications seem only being triggered 
> > >>>> for
> > >>>> standard watches but not for persistent watches when reconnecting.
> > >>>>
> > >>>> For example, for standard watches, watches.process() is invoked for 
> > >>>> sending
> > >>>> missing notifications.
> > >>>>
> > >>>> for (String path : dataWatches) {
> > >>>>>           DataNode node = getNode(path);
> > >>>>>           if (node == null) {
> > >>>>>               watcher.process(new WatchedEvent(EventType.NodeDeleted,
> > >>>>> KeeperState.SyncConnected, path));
> > >>>>>           } else if (node.stat.getMzxid() > relativeZxid) {
> > >>>>>               watcher.process(new
> > >>>>> WatchedEvent(EventType.NodeDataChanged, KeeperState.SyncConnected, 
> > >>>>> path));
> > >>>>>           } else {
> > >>>>>               this.dataWatches.addWatch(path, watcher);
> > >>>>>           }
> > >>>>>       }
> > >>>>
> > >>>>
> > >>>> https://github.com/apache/zookeeper/blob/master/zookeeper-server/src/main/java/org/apache/zookeeper/server/DataTree.java#L1494-L1521
> > >>>>
> > >>>> However, for persistence watches, we only register the watches, not
> > >>>> detecting and sending missing notifications.
> > >>>>
> > >>>> for (String path : persistentRecursiveWatches) {
> > >>>>>           this.dataWatches.addWatch(path, watcher,
> > >>>>> WatcherMode.PERSISTENT_RECURSIVE);
> > >>>>>       }
> > >>>>
> > >>>>
> > >>>> https://github.com/apache/zookeeper/blob/master/zookeeper-server/src/main/java/org/apache/zookeeper/server/DataTree.java#L1494-L1521
> > >>>>
> > >>>> Thanks,
> > >>>>
> > >>>> Li
> > >>>
> > >>
> > >>
> >

Reply via email to