[jira] Updated: (ZOOKEEPER-919) Ephemeral nodes remains in one of ensemble after deliberate SIGKILL

Vishal K (JIRA) Thu, 09 Dec 2010 01:04:33 -0800

     [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Vishal K updated ZOOKEEPER-919:
-------------------------------

    Attachment: logs2.tar.gz

The default debug messages are not good enough to debug this bug.
Attaching more debug logs. I added more debug messages (starting with VK DEBUG).

What I am seeing is that after restart, while loading snapshot, the server is 
still able to find an old znode in the snapshot. This is happening even if the 
snapshot is taken after the transaction that deletes the session (and ephemeral 
znode) from the ZooKeeper data tree.

For example, we can see that ephemeral znode (/fooOnline) is deleted during 
transaction 0xf0000001a. See faulty-zookeeper.log.

2010-12-09 03:11:50,609 - TRACE [CommitProcessor:2:zootr...@89] - 
:Esessionid:0x22cc91caece0000 type:closeSession cxid:0x0 zxid:0xf0000001a 
txntype:-11 reqpath:n/a
2010-12-09 03:11:50,609 - DEBUG [CommitProcessor:2:finalrequestproces...@103] - 
VK DEBUG: sessionid:0x22cc91caece0000 type:closeSession cxid:0x0 
zxid:0xf0000001a txntype:-11 reqpath:n/a
2010-12-09 03:11:50,609 - DEBUG [CommitProcessor:2:datat...@751] - VK DEBUG: 
header 156721212080128000,0,64424509466,1291864306011,-11
2010-12-09 03:11:50,618 - DEBUG [CommitProcessor:2:datat...@825] - VK DEBUG: 
Found ephemeral node /fooOnline for session 0x22cc91caece0000
2010-12-09 03:11:50,619 - DEBUG [CommitProcessor:2:datat...@839] - Deleting 
ephemeral node /fooOnline for session 0x22cc91caece0000

However, after several reboots, the znode still lingers in the data tree.

2010-12-09 03:12:11,762 - INFO [Thread-2:files...@82] - Reading snapshot 
/var/zookeeper/version-2/snapshot.f0000001d (zxid f0000001d)
[...]
2010-12-09 03:12:11,768 - TRACE [Thread-2:zootr...@70] - loadData --- session 
in archive: 22cc91caece0000 with timeout: 9000
2010-12-09 03:12:11,771 - DEBUG [Thread-2:datat...@1088] - VK DEBUG Adding path 
/fooOnline for 22cc91caece0000


2010-12-09 03:13:22,573 - INFO [QuorumPeer:/0.0.0.0:2181:files...@82] - Reading 
snapshot /var/zookeeper/version-2/snapshot.f0000002a (zxid f0000002a)
[...]
2010-12-09 03:13:22,579 - TRACE [QuorumPeer:/0.0.0.0:2181:zootr...@70] - 
loadData --- session in archive: 22cc91caece0000 with timeout: 9000
[...]
2010-12-09 03:13:22,581 - DEBUG [QuorumPeer:/0.0.0.0:2181:datat...@1088] - VK 
DEBUG Adding path /fooOnline for 22cc91caece0000


> Ephemeral nodes remains in one of ensemble after deliberate SIGKILL
> -------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-919
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-919
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.1
>         Environment: Linux CentOS 5.3 64bit, JDK 1.6.0-22
> SLES 11
>            Reporter: Chang Song
>            Priority: Blocker
>             Fix For: 3.3.3, 3.4.0
>
>         Attachments: logs.tar.gz, logs2.tar.gz
>
>
> I was testing stability of Zookeeper ensemble for production deployment. 
> Three node ensemble cluster configuration.
> In a loop, I kill/restart three Zookeeper clients that created one ephemeral 
> node each, and at the same time,
> I killed Java process on one of ensemble (dont' know if it was a leader or 
> not). Then I restarted Zookeeper on the server,
> It turns out that on two zookeeper ensemble servers, all the ephemeral nodes 
> are gone (it should), but on the newly started
> Zookeeper server, the two old ephemeral nodes stayed.  The zookeeper didn't 
> restart in standalone mode since new ephemeral
> nodes gets created on all ensemble servers. 
> I captured the log.
> 2010-11-04 17:48:50,201 - INFO  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:17288:nioservercnxn$fact...@250] - 
> Accepted socket connection from /10.25.131.21:11191
> 2010-11-04 17:48:50,202 - INFO  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:17288:nioserverc...@776] - Client 
> attempting to establish new session at /10.25.131.21:11191
> 2010-11-04 17:48:50,203 - INFO  [CommitProcessor:1:nioserverc...@1579] - 
> Established session 0x12c160c31fc000b with negotiated timeout 30000 for 
> client /10.25.131.21:11191
> 2010-11-04 17:48:50,206 - WARN  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:17288:nioserverc...@633] - 
> EndOfStreamException: Unable to read additional data from client sessionid 
> 0x12c160c31fc000b, likely client has closed socket
> 2010-11-04 17:48:50,207 - INFO  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:17288:nioserverc...@1434] - Closed 
> socket connection for client /10.25.131.21:11191 which had sessionid 
> 0x12c160c31fc000b
> 2010-11-04 17:48:50,207 - ERROR [CommitProcessor:1:nioserverc...@444] - 
> Unexpected Exception:
> java.nio.channels.CancelledKeyException
>         at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:55)
>         at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:59)
>         at 
> org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:417)
>         at 
> org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.java:1508)
>         at 
> org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:367)
>         at 
> org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:73)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-919) Ephemeral nodes remains in one of ensemble after deliberate SIGKILL

Reply via email to