Sure,  I will reproduce it with debug enabled and create a JIRA. Thanks.

On Thu, Oct 7, 2010 at 12:23 PM, Patrick Hunt <ph...@apache.org> wrote:

> Vishal, this sounds like a bug in ZK to me. Can you create a JIRA with this
> description, your configuration files from all servers, and the log files
> from all servers during the time of the incident? If you could run the
> servers in DEBUG level logging during the time you reproduce the issue that
> would probably help:
> https://issues.apache.org/jira/browse/ZOOKEEPER
>
> Thanks!
>
> Patrick
>
>
> On Wed, Oct 6, 2010 at 2:57 PM, Vishal K <vishalm...@gmail.com> wrote:
>
>> Hi Patrick,
>>
>> You are correct, the test restarts both ZooKeeper server and the client.
>> The
>> client opens a new connection after restarting. So we would expect that
>> the
>> ephmeral znode (/foo) to expire after the session timeout. However, the
>> client with the new session creates the ephemeral znode (/foo) again after
>> it reboots (it sets a watch for /foo and recreates /foo if it is deleted
>> or
>> doesn't exist). The client is not reusing the session ID. What I expect to
>> see is that the older /foo should expire after which a new /foo should get
>> created. Is my expectation correct?
>>
>> What confuses me is the following output of 3 successive getstat /foo
>> requests on A (the zxid, time and owner fields).  Notice that the older
>> znode reappeared.
>> At the same time when I do getstat at B and C, I see the newer /foo.
>>
>> log4j:WARN No appenders could be found for logger
>> (org.apache.zookeeper.ZooKeeper).
>> log4j:WARN Please initialize the log4j system properly.
>> cZxid = 0x1000005ef
>> ctime = Tue Oct 05 15:00:50 UTC 2010
>> mZxid = 0x1000005ef
>> mtime = Tue Oct 05 15:00:50 UTC 2010
>> pZxid = 0x1000005ef
>> cversion = 0
>> dataVersion = 0
>> aclVersion = 0
>> ephemeralOwner = 0x2b7ce57ce40000
>> dataLength = 54
>> numChildren = 0
>>
>> log4j:WARN No appenders could be found for logger
>> (org.apache.zookeeper.ZooKeeper).
>> log4j:WARN Please initialize the log4j system properly.
>> cZxid = 0x100000607
>> ctime = Tue Oct 05 15:01:07 UTC 2010
>> mZxid = 0x100000607
>> mtime = Tue Oct 05 15:01:07 UTC 2010
>> pZxid = 0x100000607
>> cversion = 0
>> dataVersion = 0
>> aclVersion = 0
>> ephemeralOwner = 0x2b7ce5bda40000
>> dataLength = 54
>> numChildren = 0
>>
>> log4j:WARN No appenders could be found for logger
>> (org.apache.zookeeper.ZooKeeper).
>> log4j:WARN Please initialize the log4j system properly.
>> cZxid = 0x1000005ef
>> ctime = Tue Oct 05 15:00:50 UTC 2010
>> mZxid = 0x1000005ef
>> mtime = Tue Oct 05 15:00:50 UTC 2010
>> pZxid = 0x1000005ef
>> cversion = 0
>> dataVersion = 0
>> aclVersion = 0
>> ephemeralOwner = 0x2b7ce57ce40000
>> dataLength = 54
>> numChildren = 0
>>
>> Thanks for your help.
>>
>> -Vishal
>>
>> On Wed, Oct 6, 2010 at 4:45 PM, Patrick Hunt <ph...@apache.org> wrote:
>>
>> > Vishal the attachment seems to be getting removed by the list daemon (I
>> > don't have it), can you create a JIRA and attach? Also this is a good
>> > question for the ppl on zookeeper-user. (ccing)
>> >
>> > You are aware that ephemeral znodes are tied to the session? And that
>> > sessions only expire after the session timeout period? At which time any
>> > znodes created during that session are then deleted. The fact that you
>> are
>> > "kill"ing your client process leads me to believe that you are not
>> closing
>> > the session cleanly (meaning that it will eventually expire after the
>> > session timeout period), in which case the ephemeral znodes _should_
>> > reappear when A is restarted and successfully rejoins the cluster. (at
>> > least
>> > until the session timeout is exceeded)
>> >
>> > Patrick
>> >
>> > On Tue, Oct 5, 2010 at 11:04 AM, Vishal K <vishalm...@gmail.com> wrote:
>> >
>> > > Hi,
>> > >
>> > > I have a 3 node ZK cluster (A, B, C). On one of the the nodes (node
>> A), I
>> > > have a ZK client running that connects to the local server and creates
>> an
>> > > ephemeral znode to indicate clients on other nodes that it is online.
>> > >
>> > > I have test script that reboots the zookeeper server as well as client
>> on
>> > > A. The test does a getstat on the ephemeral znode created by the
>> client
>> > on
>> > > A. I am seeing that the view of znodes on A is different from the
>> other 2
>> > > nodes. I can tell this from the session ID that the client gets after
>> > > reconnecting to the local ZK server.
>> > >
>> > > So the test is simple:
>> > > - kill zookeeper server and client process
>> > > - wait for a few seconds
>> > > - do zkCli.sh stat ... > test.out
>> > >
>> > > What I am seeing is that the ephemeral znode with old zxid, time, and
>> > > session ID is reappearing on node A. I have attached the output of 3
>> > > consecutive getstat requests of the test (see client_getstat.out).
>> Notice
>> > > that the third output is the same as the first one. That is, the old
>> > > ephemeral znode reappeared at A. However, both B and C are showing the
>> > > latest znode with correct time, zxid and session ID (output not
>> > attached).
>> > >
>> > > After this point, all following getstat requests on A are showing the
>> old
>> > > znode. Whereas, B and C show the correct znode every time the client
>> on A
>> > > comes online. This is something very perplexing. Earlier I thought
>> this
>> > was
>> > > a bug in my client implementation. But the test shows that the ZK
>> server
>> > on
>> > > A after reboot is out of sync with rest of the servers.
>> > >
>> > > The stat command to each server shows that the servers are in sync as
>> far
>> > > as zxid's are concerned (see stat.out). So there is something wrong
>> with
>> > A's
>> > > local database that is causing this problem.
>> > >
>> > > Has anyone seen this before? I will be doing more debugging in the
>> next
>> > few
>> > > days. Comments/suggestions for further debugging are welcomed.
>> > >
>> > > -Vishal
>> > >
>> > >
>> > >
>> >
>>
>
>

Reply via email to