[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12972582#action_12972582
 ] 

Camille Fournier commented on ZOOKEEPER-919:
--------------------------------------------

I don't think so, correct me if I am wrong:
Starting the log from the last txn in the snapshot (which is set in this case 
by the actual file name of the snapshot, not the last transaction it actually 
contains)  will definitely not fix this problem since the snapshot does not 
contain the missing transactions. In fact, if you did this, you would end up 
even worse off since you would expand the window in which the follower could 
crash and lose those transactions (not only before they are logged, but any 
time before a snapshot that actually contains a record of the missing 
transactions is taken).
In the patch submitted on 882, we still don't actually use that log file last 
transaction id anywhere, so that fix is pretty meaningless for the actual 
functioning of the system so far as I can tell. If we were using it, we 
probably wouldn't have this problem.

> Ephemeral nodes remains in one of ensemble after deliberate SIGKILL
> -------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-919
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-919
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.1
>         Environment: Linux CentOS 5.3 64bit, JDK 1.6.0-22
> SLES 11
>            Reporter: Chang Song
>            Priority: Blocker
>             Fix For: 3.3.3, 3.4.0
>
>         Attachments: logs.tar.gz, logs2.tar.gz, logs3.tar.gz, zk.patch
>
>
> I was testing stability of Zookeeper ensemble for production deployment. 
> Three node ensemble cluster configuration.
> In a loop, I kill/restart three Zookeeper clients that created one ephemeral 
> node each, and at the same time,
> I killed Java process on one of ensemble (dont' know if it was a leader or 
> not). Then I restarted Zookeeper on the server,
> It turns out that on two zookeeper ensemble servers, all the ephemeral nodes 
> are gone (it should), but on the newly started
> Zookeeper server, the two old ephemeral nodes stayed.  The zookeeper didn't 
> restart in standalone mode since new ephemeral
> nodes gets created on all ensemble servers. 
> I captured the log.
> 2010-11-04 17:48:50,201 - INFO  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:17288:nioservercnxn$fact...@250] - 
> Accepted socket connection from /10.25.131.21:11191
> 2010-11-04 17:48:50,202 - INFO  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:17288:nioserverc...@776] - Client 
> attempting to establish new session at /10.25.131.21:11191
> 2010-11-04 17:48:50,203 - INFO  [CommitProcessor:1:nioserverc...@1579] - 
> Established session 0x12c160c31fc000b with negotiated timeout 30000 for 
> client /10.25.131.21:11191
> 2010-11-04 17:48:50,206 - WARN  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:17288:nioserverc...@633] - 
> EndOfStreamException: Unable to read additional data from client sessionid 
> 0x12c160c31fc000b, likely client has closed socket
> 2010-11-04 17:48:50,207 - INFO  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:17288:nioserverc...@1434] - Closed 
> socket connection for client /10.25.131.21:11191 which had sessionid 
> 0x12c160c31fc000b
> 2010-11-04 17:48:50,207 - ERROR [CommitProcessor:1:nioserverc...@444] - 
> Unexpected Exception:
> java.nio.channels.CancelledKeyException
>         at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:55)
>         at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:59)
>         at 
> org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:417)
>         at 
> org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.java:1508)
>         at 
> org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:367)
>         at 
> org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:73)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to