[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michi Mutsuzaki updated ZOOKEEPER-1653:
---------------------------------------

    Attachment: ZOOKEEPER-1653.3.4.patch

Thank you guys for all the feedback. I addressed all the comments except for 
using variable substitution for logging in QuorumPeerMainTest. 
QuorumPeerMainTest uses log4j.WriterAppender, and it hasn't migrated to slf4j, 
so I'm leaving the logging to use string concatenation for now.

> zookeeper fails to start because of inconsistent epoch
> ------------------------------------------------------
>
>                 Key: ZOOKEEPER-1653
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1653
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: quorum
>    Affects Versions: 3.4.5
>            Reporter: Michi Mutsuzaki
>            Assignee: Michi Mutsuzaki
>            Priority: Blocker
>             Fix For: 3.4.6
>
>         Attachments: ZOOKEEPER-1653.3.4.patch, ZOOKEEPER-1653.3.4.patch, 
> ZOOKEEPER-1653.3.4.patch, ZOOKEEPER-1653.patch, ZOOKEEPER-1653.patch
>
>
> It looks like QuorumPeer.loadDataBase() could fail if the server was 
> restarted after zk.takeSnapshot() but before finishing 
> self.setCurrentEpoch(newEpoch) in Learner.java.
> {code:java}
> case Leader.NEWLEADER: // it will be NEWLEADER in v1.0
>     zk.takeSnapshot();
>     self.setCurrentEpoch(newEpoch); // <<< got restarted here
>     snapshotTaken = true;
>     writePacket(new QuorumPacket(Leader.ACK, newLeaderZxid, null, null), 
> true);
>     break;
> {code}
> The server fails to start because currentEpoch is still 1 but the last 
> processed zkid from the snapshot has been updated.
> {noformat}
> 2013-02-20 13:45:02,733 5543 [pool-1-thread-1] ERROR 
> org.apache.zookeeper.server.quorum.QuorumPeer  - Unable to load database on 
> disk
> java.io.IOException: The current epoch, 1, is older than the last zxid, 
> 8589934592
>         at 
> org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:439)
>         at 
> org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:413)
>         ...
> {noformat}
> {noformat}
> $ find datadir                                     
> datadir
> datadir/version-2
> datadir/version-2/currentEpoch.tmp
> datadir/version-2/acceptedEpoch
> datadir/version-2/snapshot.0
> datadir/version-2/currentEpoch
> datadir/version-2/snapshot.200000000
> $ cat datadir/version-2/currentEpoch.tmp
> 2%
> $ cat datadir/version-2/acceptedEpoch
> 2%
> $ cat datadir/version-2/currentEpoch
> 1%
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to