[
https://issues.apache.org/jira/browse/ZOOKEEPER-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13400257#comment-13400257
]
Marshall McMullen commented on ZOOKEEPER-1453:
----------------------------------------------
I was able to reproduce this problem again. After I power cycled the server a
few times, the node in question refuses to join the ensemble and no clients can
connect to it. When I try to telnet to the host in question and issue 'stat' it
fails with:
This ZooKeeper instance is not currently serving requests
I enabled tracing and in the log file as it's starting up it fails with:
2012-06-24 20:34:31,734 [myid:1] - INFO [main:FileSnap@83][] - Reading
snapshot /sf/data/zookeeper/10.10.5.123/version-2/snapshot.0
2012-06-24 20:34:31,738 [myid:1] - DEBUG
[main:FileTxnLog$FileTxnIterator@575][] - Created new input stream
/sf/data/zookeeper/10.10.5.123/version-2/log.100000001
2012-06-24 20:34:31,738 [myid:1] - DEBUG
[main:FileTxnLog$FileTxnIterator@578][] - Created new input archive
/sf/data/zookeeper/10.10.5.123/version-2/log.100000001
2012-06-24 20:34:31,763 [myid:1] - DEBUG [main:DataTree@951][] - Ignoring
processTxn failure hdr: -1 : error: -110
2012-06-24 20:34:31,763 [myid:1] - DEBUG [main:FileTxnSnapLog@241][] - Ignoring
processTxn failure hdr: -1 : error: -110
2012-06-24 20:34:31,763 [myid:1] - DEBUG [main:DataTree@951][] - Ignoring
processTxn failure hdr: -1 : error: -110
...[ repeats many many times ]...
2012-06-24 20:34:32,065 [myid:1] - DEBUG
[main:FileTxnLog$FileTxnIterator@618][] - EOF excepton java.io.EOFException:
Failed to read /sf/data/zookeeper/10.10.5.123/version-2/log.100000001
2012-06-24 20:34:32,067 [myid:1] - INFO
[NIOServerCxn.Factory:/10.10.5.123:2181:NIOServerCnxnFactory@227][] - Accepted
socket connection from /10.10.5.123:39623
2012-06-24 20:34:32,069 [myid:1] - INFO
[QuorumPeerListener:QuorumCnxManager$Listener@530][] - My election bind port:
/10.10.5.123:2183
2012-06-24 20:34:32,071 [myid:1] - WARN
[NIOServerCxn.Factory:/10.10.5.123:2181:NIOServerCnxn@354][] - Exception
causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not
running
2012-06-24 20:34:32,071 [myid:1] - DEBUG
[NIOServerCxn.Factory:/10.10.5.123:2181:NIOServerCnxn@358][] - IOException
stack trace
I also have a copy of the data directory if it would help.
> corrupted logs may not be correctly identified by FileTxnIterator
> -----------------------------------------------------------------
>
> Key: ZOOKEEPER-1453
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1453
> Project: ZooKeeper
> Issue Type: Bug
> Components: server
> Affects Versions: 3.3.3
> Reporter: Patrick Hunt
> Priority: Critical
>
> See ZOOKEEPER-1449 for background on this issue. The main problem is that
> during server recovery
> org.apache.zookeeper.server.persistence.FileTxnLog.FileTxnIterator.next()
> does not indicate if the available logs are valid or not. In some cases (say
> a truncated record and a single txnlog in the datadir) we will not detect
> that the file is corrupt, vs reaching the end of the file.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira