[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13765472#comment-13765472
 ] 

Sergey Maslyakov commented on ZOOKEEPER-1747:
---------------------------------------------

Let me take my original statement about "force-ignoring" errors back. I think 
Zookeeper server should handle data consistency issues gracefully. This means, 
it needs to handle this type of errors as opposed to terminating. The reaction 
to an error can be controlled by the user.

# For fatal errors, such as missing {{myid}} file, ZK server server shall exit.
# For non-fatal data consistency errors (empty log, missing epoch files, etc), 
ZK can be instructed to:
## Come up empty
## Make "best effort" in restoring DataTree. If no data can be restored 
consistently, ZK can be instructed to:
### Come up empty
### Exit

This way, a system operator, who is not a ZK expert, can be given a set of work 
instruction on how to recover a failing ZK service.
                
> Zookeeper server fails to start if transaction log file is corrupted
> --------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1747
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1747
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.4.5
>         Environment: Solaris10/x86, Java 1.6
>            Reporter: Sergey Maslyakov
>
> On multiple occasions when ZK was not able to write out a transaction log or 
> a snapshot file, the consequent attempt to restart the server fails. Usually 
> it happens when the underlying file system filled up; thus, preventing ZK 
> server from writing out consistent data file.
> Upon start-up, the server reads in the snapshot and the transaction log. If 
> the deserializer fails and throws an exception, server terminates. Please see 
> the stack trace below.
> Server not coming up for whatever reason is often an undesirable condition. 
> It would be nice to have an option to force-ignore parsing errors, 
> especially, in the transaction log. A check sum on the data could be a 
> possible solution to ensure the integrity and "parsability".
> Another robustness enhancement could be via proper handling of the condition 
> when snapshot or transaction log cannot be completely written to disk. 
> Basically, better handling of write errors.
> {noformat}
> 2013-08-28 12:05:30,732 ERROR [ZooKeeperServerMain] Unexpected exception, 
> exiting abnormally
> java.io.EOFException
>         at java.io.DataInputStream.readInt(DataInputStream.java:375)
>         at 
> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
>         at 
> org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64)
>         at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558)
>         at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577)
>         at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543)
>         at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625)
>         at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:160)
>         at 
> org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
>         at 
> org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:250)
>         at 
> org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:383)
>         at 
> org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:122)
>         at 
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112)
>         at 
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
>         at 
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
>         at 
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:129)
>         at 
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to