Hi Patrick, You are correct, the test restarts both ZooKeeper server and the client. The client opens a new connection after restarting. So we would expect that the ephmeral znode (/foo) to expire after the session timeout. However, the client with the new session creates the ephemeral znode (/foo) again after it reboots (it sets a watch for /foo and recreates /foo if it is deleted or doesn't exist). The client is not reusing the session ID. What I expect to see is that the older /foo should expire after which a new /foo should get created. Is my expectation correct?
What confuses me is the following output of 3 successive getstat /foo requests on A (the zxid, time and owner fields). Notice that the older znode reappeared. At the same time when I do getstat at B and C, I see the newer /foo. log4j:WARN No appenders could be found for logger (org.apache.zookeeper.ZooKeeper). log4j:WARN Please initialize the log4j system properly. cZxid = 0x1000005ef ctime = Tue Oct 05 15:00:50 UTC 2010 mZxid = 0x1000005ef mtime = Tue Oct 05 15:00:50 UTC 2010 pZxid = 0x1000005ef cversion = 0 dataVersion = 0 aclVersion = 0 ephemeralOwner = 0x2b7ce57ce40000 dataLength = 54 numChildren = 0 log4j:WARN No appenders could be found for logger (org.apache.zookeeper.ZooKeeper). log4j:WARN Please initialize the log4j system properly. cZxid = 0x100000607 ctime = Tue Oct 05 15:01:07 UTC 2010 mZxid = 0x100000607 mtime = Tue Oct 05 15:01:07 UTC 2010 pZxid = 0x100000607 cversion = 0 dataVersion = 0 aclVersion = 0 ephemeralOwner = 0x2b7ce5bda40000 dataLength = 54 numChildren = 0 log4j:WARN No appenders could be found for logger (org.apache.zookeeper.ZooKeeper). log4j:WARN Please initialize the log4j system properly. cZxid = 0x1000005ef ctime = Tue Oct 05 15:00:50 UTC 2010 mZxid = 0x1000005ef mtime = Tue Oct 05 15:00:50 UTC 2010 pZxid = 0x1000005ef cversion = 0 dataVersion = 0 aclVersion = 0 ephemeralOwner = 0x2b7ce57ce40000 dataLength = 54 numChildren = 0 Thanks for your help. -Vishal On Wed, Oct 6, 2010 at 4:45 PM, Patrick Hunt <ph...@apache.org> wrote: > Vishal the attachment seems to be getting removed by the list daemon (I > don't have it), can you create a JIRA and attach? Also this is a good > question for the ppl on zookeeper-user. (ccing) > > You are aware that ephemeral znodes are tied to the session? And that > sessions only expire after the session timeout period? At which time any > znodes created during that session are then deleted. The fact that you are > "kill"ing your client process leads me to believe that you are not closing > the session cleanly (meaning that it will eventually expire after the > session timeout period), in which case the ephemeral znodes _should_ > reappear when A is restarted and successfully rejoins the cluster. (at > least > until the session timeout is exceeded) > > Patrick > > On Tue, Oct 5, 2010 at 11:04 AM, Vishal K <vishalm...@gmail.com> wrote: > > > Hi, > > > > I have a 3 node ZK cluster (A, B, C). On one of the the nodes (node A), I > > have a ZK client running that connects to the local server and creates an > > ephemeral znode to indicate clients on other nodes that it is online. > > > > I have test script that reboots the zookeeper server as well as client on > > A. The test does a getstat on the ephemeral znode created by the client > on > > A. I am seeing that the view of znodes on A is different from the other 2 > > nodes. I can tell this from the session ID that the client gets after > > reconnecting to the local ZK server. > > > > So the test is simple: > > - kill zookeeper server and client process > > - wait for a few seconds > > - do zkCli.sh stat ... > test.out > > > > What I am seeing is that the ephemeral znode with old zxid, time, and > > session ID is reappearing on node A. I have attached the output of 3 > > consecutive getstat requests of the test (see client_getstat.out). Notice > > that the third output is the same as the first one. That is, the old > > ephemeral znode reappeared at A. However, both B and C are showing the > > latest znode with correct time, zxid and session ID (output not > attached). > > > > After this point, all following getstat requests on A are showing the old > > znode. Whereas, B and C show the correct znode every time the client on A > > comes online. This is something very perplexing. Earlier I thought this > was > > a bug in my client implementation. But the test shows that the ZK server > on > > A after reboot is out of sync with rest of the servers. > > > > The stat command to each server shows that the servers are in sync as far > > as zxid's are concerned (see stat.out). So there is something wrong with > A's > > local database that is causing this problem. > > > > Has anyone seen this before? I will be doing more debugging in the next > few > > days. Comments/suggestions for further debugging are welcomed. > > > > -Vishal > > > > > > >