[
https://issues.apache.org/jira/browse/ZOOKEEPER-919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Vishal K updated ZOOKEEPER-919:
-------------------------------
Attachment: logs2.tar.gz
The default debug messages are not good enough to debug this bug.
Attaching more debug logs. I added more debug messages (starting with VK DEBUG).
What I am seeing is that after restart, while loading snapshot, the server is
still able to find an old znode in the snapshot. This is happening even if the
snapshot is taken after the transaction that deletes the session (and ephemeral
znode) from the ZooKeeper data tree.
For example, we can see that ephemeral znode (/fooOnline) is deleted during
transaction 0xf0000001a. See faulty-zookeeper.log.
2010-12-09 03:11:50,609 - TRACE [CommitProcessor:2:zootr...@89] -
:Esessionid:0x22cc91caece0000 type:closeSession cxid:0x0 zxid:0xf0000001a
txntype:-11 reqpath:n/a
2010-12-09 03:11:50,609 - DEBUG [CommitProcessor:2:finalrequestproces...@103] -
VK DEBUG: sessionid:0x22cc91caece0000 type:closeSession cxid:0x0
zxid:0xf0000001a txntype:-11 reqpath:n/a
2010-12-09 03:11:50,609 - DEBUG [CommitProcessor:2:datat...@751] - VK DEBUG:
header 156721212080128000,0,64424509466,1291864306011,-11
2010-12-09 03:11:50,618 - DEBUG [CommitProcessor:2:datat...@825] - VK DEBUG:
Found ephemeral node /fooOnline for session 0x22cc91caece0000
2010-12-09 03:11:50,619 - DEBUG [CommitProcessor:2:datat...@839] - Deleting
ephemeral node /fooOnline for session 0x22cc91caece0000
However, after several reboots, the znode still lingers in the data tree.
2010-12-09 03:12:11,762 - INFO [Thread-2:files...@82] - Reading snapshot
/var/zookeeper/version-2/snapshot.f0000001d (zxid f0000001d)
[...]
2010-12-09 03:12:11,768 - TRACE [Thread-2:zootr...@70] - loadData --- session
in archive: 22cc91caece0000 with timeout: 9000
2010-12-09 03:12:11,771 - DEBUG [Thread-2:datat...@1088] - VK DEBUG Adding path
/fooOnline for 22cc91caece0000
2010-12-09 03:13:22,573 - INFO [QuorumPeer:/0.0.0.0:2181:files...@82] - Reading
snapshot /var/zookeeper/version-2/snapshot.f0000002a (zxid f0000002a)
[...]
2010-12-09 03:13:22,579 - TRACE [QuorumPeer:/0.0.0.0:2181:zootr...@70] -
loadData --- session in archive: 22cc91caece0000 with timeout: 9000
[...]
2010-12-09 03:13:22,581 - DEBUG [QuorumPeer:/0.0.0.0:2181:datat...@1088] - VK
DEBUG Adding path /fooOnline for 22cc91caece0000
> Ephemeral nodes remains in one of ensemble after deliberate SIGKILL
> -------------------------------------------------------------------
>
> Key: ZOOKEEPER-919
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-919
> Project: ZooKeeper
> Issue Type: Bug
> Components: server
> Affects Versions: 3.3.1
> Environment: Linux CentOS 5.3 64bit, JDK 1.6.0-22
> SLES 11
> Reporter: Chang Song
> Priority: Blocker
> Fix For: 3.3.3, 3.4.0
>
> Attachments: logs.tar.gz, logs2.tar.gz
>
>
> I was testing stability of Zookeeper ensemble for production deployment.
> Three node ensemble cluster configuration.
> In a loop, I kill/restart three Zookeeper clients that created one ephemeral
> node each, and at the same time,
> I killed Java process on one of ensemble (dont' know if it was a leader or
> not). Then I restarted Zookeeper on the server,
> It turns out that on two zookeeper ensemble servers, all the ephemeral nodes
> are gone (it should), but on the newly started
> Zookeeper server, the two old ephemeral nodes stayed. The zookeeper didn't
> restart in standalone mode since new ephemeral
> nodes gets created on all ensemble servers.
> I captured the log.
> 2010-11-04 17:48:50,201 - INFO
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:17288:nioservercnxn$fact...@250] -
> Accepted socket connection from /10.25.131.21:11191
> 2010-11-04 17:48:50,202 - INFO
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:17288:nioserverc...@776] - Client
> attempting to establish new session at /10.25.131.21:11191
> 2010-11-04 17:48:50,203 - INFO [CommitProcessor:1:nioserverc...@1579] -
> Established session 0x12c160c31fc000b with negotiated timeout 30000 for
> client /10.25.131.21:11191
> 2010-11-04 17:48:50,206 - WARN
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:17288:nioserverc...@633] -
> EndOfStreamException: Unable to read additional data from client sessionid
> 0x12c160c31fc000b, likely client has closed socket
> 2010-11-04 17:48:50,207 - INFO
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:17288:nioserverc...@1434] - Closed
> socket connection for client /10.25.131.21:11191 which had sessionid
> 0x12c160c31fc000b
> 2010-11-04 17:48:50,207 - ERROR [CommitProcessor:1:nioserverc...@444] -
> Unexpected Exception:
> java.nio.channels.CancelledKeyException
> at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:55)
> at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:59)
> at
> org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:417)
> at
> org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.java:1508)
> at
> org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:367)
> at
> org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:73)
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.