Calling shutdown() on a QuorumPeer too quickly can lead to a corrupt log
------------------------------------------------------------------------
Key: ZOOKEEPER-1069
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1069
Project: ZooKeeper
Issue Type: Bug
Components: quorum, server
Affects Versions: 3.3.3
Environment: Linux, ZK 3.3.3, 3-node cluster.
Reporter: Jeremy Stribling
Priority: Minor
I've only seen this happen once. In order to restart Zookeeper with a new set
of servers, we have a wrapper class that calls shutdown() on an existing
QuorumPeer, and then starts a new one with a new set of servers. Specifically,
our shutdown code looks like this:
{code}
synchronized(_quorum_peer) {
_quorum_peer.shutdown();
FastLeaderElection fle = (FastLeaderElection) _quorum_peer.getElectionAlg();
fle.shutdown(); // I think this is unnecessary
try {
_quorum_peer.getTxnFactory().commit();
} catch (java.nio.channels.ClosedChannelException e) {
// ignore
}
}
{code}
One time, our wrapper class started one QuorumPeer, and then had to shut it
down and start a new one very soon after the QuorumPeer transitioned into a
FOLLOWING state. When the new QuorumPeer tried to read in the latest log from
disk, it encountered a bogus magic number of all zeroes:
{noformat}
2011-05-18 22:42:29,823 10467 [pool-1-thread-2] FATAL
org.apache.zookeeper.server.quorum.QuorumPeer - Unable to load database on disk
java.io.IOException: Transaction log:
/var/cloudnet/data/zookeeper/version-2/log.700000001 has invalid magic number 0
!= 1514884167
at
org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:510)
at
org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:527)
at
org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:493)
at
org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:576)
at
org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:479)
at
org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.<init>(FileTxnLog.java:454)
at
org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:325)
at
org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:126)
at
org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:222)
at
org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:398)
...
2011-05-18 22:42:29,823 10467 [pool-1-thread-2] ERROR
com.nicira.onix.zookeeper.Zookeeper - Unexpected exception
java.lang.RuntimeException: Unable to run quorum server
at
org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:401)
at
com.nicira.onix.zookeeper.Zookeeper.StartZookeeper(Zookeeper.java:198)
at
com.nicira.onix.zookeeper.Zookeeper.RestartZookeeper(Zookeeper.java:277)
at com.nicira.onix.zookeeper.ZKRPCService.setServers(ZKRPC.java:83)
at
com.nicira.onix.zookeeper.Zkrpc$ZKRPCService.callMethod(Zkrpc.java:8198)
at com.nicira.onix.rpc.RPC$10.run(RPC.java:534)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.IOException: Transaction log:
/var/cloudnet/data/zookeeper/version-2/log.700000001 has invalid magic number 0
!= 1514884167
at
org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:510)
at
org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:527)
at
org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:493)
at
org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:576)
at
org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:479)
at
org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.<init>(FileTxnLog.java:454)
at
org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:325)
at
org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:126)
at
org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:222)
at
org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:398)
... 8 more
{noformat}
I looked into the code a bit, and I believe the problem comes from the fact
that QuorumPeer.shutdown() does not join() on this before returning. Here's
the scenario I think can happen:
# QuorumPeer.run() notices it is in the FOLLOWING state, makes a new Follower,
and calls Follower.followLeader(), which starts connecting to the leader.
# In the main program thread, QuorumPeer.shutdown() is called.
# Through a complicated series of calls, this eventually leads to
FollowerZooKeeperServer.shutdown() being called.
# This method calls SyncRequestProcess.shutdown(), which joins on this and
returns. However, it's possible that the SyncRequestProcessor thread hasn't
yet been started because followLeader() hasn't yet called
Learner.syncWithLeader(), which hasn't yet called ZooKeeperServer.startup(),
which actually starts the thread. Thus, the join would have no request, though
a requestOfDeath is added to the queued requests list (possibly behind other
requests).
# Back in the main thread, FileTxnSnapLog.commit() is called, which doesn't do
much because the processor hasn't processed anything yet.
# Finally, ZooKeeperServer.startup is called in the QuorumPeer.run() thread,
starting up the SyncRequestProcessor thread.
# That thread appends some request to the log. The log doesn't exist yet, so
it creates a new one, padding it with zeroes.
# Now either the SyncRequestProcessor hits the requestOfDeath or the whole
QuorumPeer object is deleted. It exits that thread without ever committing the
log to disk (or the new QuorumPeer tries to read the log before the old thread
gets to commit anything), and the log ends up with all zeroes instead of a
proper magic number.
I haven't yet looked into whether there's an easy way to join() on the
QuorumPeer thread from shutdown(), so that it won't go on to start the
processor threads after it's been shutdown. I wanted to check with the group
first and see if anyone else agrees this could be a problem.
I marked this as minor since I think almost no one else uses Zookeeper this
way, but it's pretty important to me personally.
I will upload a log file showing this behavior shortly.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira