Re: [jira] [Commented] (ZOOKEEPER-1367) Data inconsistencies and unexpired ephemeral nodes after cluster restart

Camille Fournier Fri, 27 Jan 2012 12:42:42 -0800

I debugged this on the plane. Due to
Transaction log not saving create session. I think. Will give more details
later.


C

>From my phone
On Jan 27, 2012 2:58 PM, "Patrick Hunt (Commented) (JIRA)" <j...@apache.org>
wrote:

>
>    [
> https://issues.apache.org/jira/browse/ZOOKEEPER-1367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13195053#comment-13195053]
>
> Patrick Hunt commented on ZOOKEEPER-1367:
> -----------------------------------------
>
> Jeremy - can the cleanup scripts be modified to gzip the old logs rather
> than rm'ing them? That might allow you to provide us with the full logs for
> reproducing the issue.
>
> > Data inconsistencies and unexpired ephemeral nodes after cluster restart
> > ------------------------------------------------------------------------
> >
> >                 Key: ZOOKEEPER-1367
> >                 URL:
> https://issues.apache.org/jira/browse/ZOOKEEPER-1367
> >             Project: ZooKeeper
> >          Issue Type: Bug
> >          Components: server
> >    Affects Versions: 3.4.2
> >         Environment: Debian Squeeze, 64-bit
> >            Reporter: Jeremy Stribling
> >            Priority: Blocker
> >             Fix For: 3.4.3
> >
> >         Attachments: ZOOKEEPER-1367.tgz
> >
> >
> > In one of our tests, we have a cluster of three ZooKeeper servers.  We
> kill all three, and then restart just two of them.  Sometimes we notice
> that on one of the restarted servers, ephemeral nodes from previous
> sessions do not get deleted, while on the other server they do.  We are
> effectively running 3.4.2, though technically we are running 3.4.1 with the
> patch manually applied for ZOOKEEPER-1333 and a C client for 3.4.1 with the
> patches for ZOOKEEPER-1163.
> > I noticed that when I connected using zkCli.sh to the first node
> (90.0.0.221, zkid 84), I saw only one znode in a particular path:
> > {quote}
> > [zk: 90.0.0.221:2888(CONNECTED) 0] ls /election/zkrsm
> > [nominee0000000011]
> > [zk: 90.0.0.221:2888(CONNECTED) 1] get /election/zkrsm/nominee0000000011
> > 90.0.0.222:7777
> > cZxid = 0x400000027
> > ctime = Thu Jan 19 08:18:24 UTC 2012
> > mZxid = 0x400000027
> > mtime = Thu Jan 19 08:18:24 UTC 2012
> > pZxid = 0x400000027
> > cversion = 0
> > dataVersion = 0
> > aclVersion = 0
> > ephemeralOwner = 0xa234f4f3bc220001
> > dataLength = 16
> > numChildren = 0
> > {quote}
> > However, when I connect zkCli.sh to the second server (90.0.0.222, zkid
> 251), I saw three znodes under that same path:
> > {quote}
> > [zk: 90.0.0.222:2888(CONNECTED) 2] ls /election/zkrsm
> > nominee0000000006   nominee0000000010   nominee0000000011
> > [zk: 90.0.0.222:2888(CONNECTED) 2] get /election/zkrsm/nominee0000000011
> > 90.0.0.222:7777
> > cZxid = 0x400000027
> > ctime = Thu Jan 19 08:18:24 UTC 2012
> > mZxid = 0x400000027
> > mtime = Thu Jan 19 08:18:24 UTC 2012
> > pZxid = 0x400000027
> > cversion = 0
> > dataVersion = 0
> > aclVersion = 0
> > ephemeralOwner = 0xa234f4f3bc220001
> > dataLength = 16
> > numChildren = 0
> > [zk: 90.0.0.222:2888(CONNECTED) 3] get /election/zkrsm/nominee0000000010
> > 90.0.0.221:7777
> > cZxid = 0x30000014c
> > ctime = Thu Jan 19 07:53:42 UTC 2012
> > mZxid = 0x30000014c
> > mtime = Thu Jan 19 07:53:42 UTC 2012
> > pZxid = 0x30000014c
> > cversion = 0
> > dataVersion = 0
> > aclVersion = 0
> > ephemeralOwner = 0xa234f4f3bc220000
> > dataLength = 16
> > numChildren = 0
> > [zk: 90.0.0.222:2888(CONNECTED) 4] get /election/zkrsm/nominee0000000006
> > 90.0.0.223:7777
> > cZxid = 0x200000cab
> > ctime = Thu Jan 19 08:00:30 UTC 2012
> > mZxid = 0x200000cab
> > mtime = Thu Jan 19 08:00:30 UTC 2012
> > pZxid = 0x200000cab
> > cversion = 0
> > dataVersion = 0
> > aclVersion = 0
> > ephemeralOwner = 0x5434f5074e040002
> > dataLength = 16
> > numChildren = 0
> > {quote}
> > These never went away for the lifetime of the server, for any clients
> connected directly to that server.  Note that this cluster is configured to
> have all three servers still, the third one being down (90.0.0.223, zkid
> 162).
> > I captured the data/snapshot directories for the the two live servers.
>  When I start single-node servers using each directory, I can briefly see
> that the inconsistent data is present in those logs, though the ephemeral
> nodes seem to get (correctly) cleaned up pretty soon after I start the
> server.
> > I will upload a tar containing the debug logs and data directories from
> the failure.  I think we can reproduce it regularly if you need more info.
>
> --
> This message is automatically generated by JIRA.
> If you think it was sent incorrectly, please contact your JIRA
> administrators:
> https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
> For more information on JIRA, see: http://www.atlassian.com/software/jira
>
>
>

Re: [jira] [Commented] (ZOOKEEPER-1367) Data inconsistencies and unexpired ephemeral nodes after cluster restart

Reply via email to