[
https://issues.apache.org/jira/browse/ZOOKEEPER-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13829473#comment-13829473
]
Vinay commented on ZOOKEEPER-1653:
----------------------------------
Thanks rakesh for joining.
UpdatingEpoch file will be deleted when the snapshot loaded during startup, and
at that time currentEpoch will also be updated before deletion.
So by the time of next snapshot this file should not be there in any case.
> zookeeper fails to start because of inconsistent epoch
> ------------------------------------------------------
>
> Key: ZOOKEEPER-1653
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1653
> Project: ZooKeeper
> Issue Type: Bug
> Components: quorum
> Affects Versions: 3.4.5
> Reporter: Michi Mutsuzaki
> Assignee: Michi Mutsuzaki
> Priority: Blocker
> Fix For: 3.4.6
>
> Attachments: ZOOKEEPER-1653.3.4.patch, ZOOKEEPER-1653.3.4.patch,
> ZOOKEEPER-1653.patch, ZOOKEEPER-1653.patch
>
>
> It looks like QuorumPeer.loadDataBase() could fail if the server was
> restarted after zk.takeSnapshot() but before finishing
> self.setCurrentEpoch(newEpoch) in Learner.java.
> {code:java}
> case Leader.NEWLEADER: // it will be NEWLEADER in v1.0
> zk.takeSnapshot();
> self.setCurrentEpoch(newEpoch); // <<< got restarted here
> snapshotTaken = true;
> writePacket(new QuorumPacket(Leader.ACK, newLeaderZxid, null, null),
> true);
> break;
> {code}
> The server fails to start because currentEpoch is still 1 but the last
> processed zkid from the snapshot has been updated.
> {noformat}
> 2013-02-20 13:45:02,733 5543 [pool-1-thread-1] ERROR
> org.apache.zookeeper.server.quorum.QuorumPeer - Unable to load database on
> disk
> java.io.IOException: The current epoch, 1, is older than the last zxid,
> 8589934592
> at
> org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:439)
> at
> org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:413)
> ...
> {noformat}
> {noformat}
> $ find datadir
> datadir
> datadir/version-2
> datadir/version-2/currentEpoch.tmp
> datadir/version-2/acceptedEpoch
> datadir/version-2/snapshot.0
> datadir/version-2/currentEpoch
> datadir/version-2/snapshot.200000000
> $ cat datadir/version-2/currentEpoch.tmp
> 2%
> $ cat datadir/version-2/acceptedEpoch
> 2%
> $ cat datadir/version-2/currentEpoch
> 1%
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.1#6144)