[ https://issues.apache.org/jira/browse/ZOOKEEPER-1747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13765392#comment-13765392 ]
Germán Blanco commented on ZOOKEEPER-1747: ------------------------------------------ We have seen this issue also with inconsistent state between acceptedEpoch, currentEpoch and the transaction log. In that case the error is: {noformat} 2013-09-12 12:30:51,586 [myid:10] - ERROR [main:QuorumPeer@453] - Unable to load database on disk java.io.IOException: The current epoch, 6, is older than the last zxid, 34359738487 at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:435) at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:409) at org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:151) at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:111) at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) 2013-09-12 12:30:51,587 [myid:10] - ERROR [main:QuorumPeerMain@89] - Unexpected exception, exiting abnormally java.lang.RuntimeException: Unable to run quorum server at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:454) at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:409) at org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:151) at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:111) at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) Caused by: java.io.IOException: The current epoch, 6, is older than the last zxid, 34359738487 at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:435) ... 4 more {noformat} I guess "force-ignore" means that the server just ignores whatever is in the disk and starts with zxid=0, or? > Zookeeper server fails to start if transaction log file is corrupted > -------------------------------------------------------------------- > > Key: ZOOKEEPER-1747 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1747 > Project: ZooKeeper > Issue Type: Bug > Components: server > Affects Versions: 3.4.5 > Environment: Solaris10/x86, Java 1.6 > Reporter: Sergey Maslyakov > > On multiple occasions when ZK was not able to write out a transaction log or > a snapshot file, the consequent attempt to restart the server fails. Usually > it happens when the underlying file system filled up; thus, preventing ZK > server from writing out consistent data file. > Upon start-up, the server reads in the snapshot and the transaction log. If > the deserializer fails and throws an exception, server terminates. Please see > the stack trace below. > Server not coming up for whatever reason is often an undesirable condition. > It would be nice to have an option to force-ignore parsing errors, > especially, in the transaction log. A check sum on the data could be a > possible solution to ensure the integrity and "parsability". > Another robustness enhancement could be via proper handling of the condition > when snapshot or transaction log cannot be completely written to disk. > Basically, better handling of write errors. > {noformat} > 2013-08-28 12:05:30,732 ERROR [ZooKeeperServerMain] Unexpected exception, > exiting abnormally > java.io.EOFException > at java.io.DataInputStream.readInt(DataInputStream.java:375) > at > org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) > at > org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:160) > at > org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223) > at > org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:250) > at > org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:383) > at > org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:122) > at > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112) > at > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) > at > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:129) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) > {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira