Erik Forsberg created ZOOKEEPER-1546:
----------------------------------------
Summary: "Unable to load database on disk" when restarting after
node freeze
Key: ZOOKEEPER-1546
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1546
Project: ZooKeeper
Issue Type: Bug
Components: server
Affects Versions: 3.3.5
Reporter: Erik Forsberg
One of my zookeeper servers in a quorum of 3 froze (probably due to underlying
hardware problems). When restarting, zookeeper fails to start with the
following in zookeeper.log:
{noformat}
2012-09-04 09:02:35,300 - INFO [main:QuorumPeerConfig@90] - Reading
configuration from: /etc/zookeeper/zoo.cfg
2012-09-04 09:02:35,316 - INFO [main:QuorumPeerConfig@310] - Defaulting to
majority quorums
2012-09-04 09:02:35,333 - INFO [main:QuorumPeerMain@119] - Starting quorum peer
2012-09-04 09:02:35,358 - INFO [main:NIOServerCnxn$Factory@143] - binding to
port 0.0.0.0/0.0.0.0:2181
2012-09-04 09:02:35,379 - INFO [main:QuorumPeer@819] - tickTime set to 2000
2012-09-04 09:02:35,380 - INFO [main:QuorumPeer@830] - minSessionTimeout set
to -1
2012-09-04 09:02:35,380 - INFO [main:QuorumPeer@841] - maxSessionTimeout set
to -1
2012-09-04 09:02:35,386 - INFO [main:QuorumPeer@856] - initLimit set to 10
2012-09-04 09:02:35,523 - INFO [main:FileSnap@82] - Reading snapshot
/var/zookeeper/version-2/snapshot.500017240
2012-09-04 09:02:38,944 - ERROR [main:FileTxnSnapLog@226] - Failed to increment
parent cversion for:
/osp/production/scheduler/waitfordeps_tasks/per_period-3092724ef4d611e18411525400fff018-bulkload_histograms
org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode
for
/osp/production/scheduler/waitfordeps_tasks/per_period-3092724ef4d611e18411525400fff018-bulkload_histograms
at
org.apache.zookeeper.server.DataTree.incrementCversion(DataTree.java:1218)
at
org.apache.zookeeper.server.persistence.FileTxnSnapLog.processTransaction(FileTxnSnapLog.java:224)
at
org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:152)
at
org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:222)
at
org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:398)
at
org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:143)
at
org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:103)
at
org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:76)
2012-09-04 09:02:38,945 - FATAL [main:QuorumPeer@400] - Unable to load database
on disk
java.io.IOException: Failed to process transaction type: 2 error:
KeeperErrorCode = NoNode for
/osp/production/scheduler/waitfordeps_tasks/per_period-3092724ef4d611e18411525400fff018-bulkload_histograms
at
org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:154)
at
org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:222)
at
org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:398)
at
org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:143)
at
org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:103)
at
org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:76)
2012-09-04 09:02:38,946 - FATAL [main:QuorumPeerMain@87] - Unexpected
exception, exiting abnormally
java.lang.RuntimeException: Unable to run quorum server
at
org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:401)
at
org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:143)
at
org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:103)
at
org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:76)
Caused by: java.io.IOException: Failed to process transaction type: 2 error:
KeeperErrorCode = NoNode for
/osp/production/scheduler/waitfordeps_tasks/per_period-3092724ef4d611e18411525400fff018-bulkload_histograms
at
org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:154)
at
org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:222)
at
org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:398)
... 3 more
{noformat}
Removing data from /var/zookeeper/version-2 then restart seems to "fix" the
problem (it gets a snapshot from one of the other nodes in the quorum).
This is Zookeeper 3.3.5+19.5-1~squeeze-cdh3, i.e. from Cloudera's distribution.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira