[jira] [Created] (ZOOKEEPER-1118) Inconsistent data after server crashes several times

Kurt Young (JIRA) Tue, 05 Jul 2011 19:58:47 -0700

Inconsistent data after server crashes several times
----------------------------------------------------


                 Key: ZOOKEEPER-1118
                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1118
             Project: ZooKeeper
          Issue Type: Bug
          Components: quorum
    Affects Versions: 3.3.2
         Environment: Redhat RHEL5
            Reporter: Kurt Young
            Priority: Critical


I think there is a bug when Follower try to sync data with Leader.
Assume there are some operations committed during one server had been crashed. 
When the server restart, it will receive a NEWLEADER packet which include the 
last zxid of leader and the server will set its own lastProcessZxid to the 
leader's. 
{code:title=Follower.java|borderStyle=solid}
void followLeader() throws InterruptedException {
    fzk.registerJMX(new FollowerBean(this, zk), self.jmxLocalPeerBean);
    try {
        InetSocketAddress addr = findLeader();
        try {
            connectToLeader(addr);
            long newLeaderZxid = registerWithLeader(Leader.FOLLOWERINFO);  // 
get the last zxid from leader
            //check to see if the leader zxid is lower than ours                
                                                                          
            //this should never happen but is just a safety check               
                                                                          
            long lastLoggedZxid = self.getLastLoggedZxid();
            if ((newLeaderZxid >> 32L) < (lastLoggedZxid >> 32L)) {
                LOG.fatal("Leader epoch " + Long.toHexString(newLeaderZxid >> 
32L)
                        + " is less than our epoch " + 
Long.toHexString(lastLoggedZxid >> 32L));
                throw new IOException("Error: Epoch of leader is lower");
            }
            syncWithLeader(newLeaderZxid);   // set its own lastProcessZxid to 
leader's last zxid
{code}

Then, some COMMIT packets will be received by the server in order to sync the 
data with leader. And then, the leader will send an UPTODATE packet to server 
to take a snapshot. 
{code:title=Follower.java|borderStyle=solid}
protected void processPacket(QuorumPacket qp) throws IOException{
    switch (qp.getType()) {
    case Leader.PING:
        ping(qp);
        break;
    case Leader.PROPOSAL:
        TxnHeader hdr = new TxnHeader();
        BinaryInputArchive ia = BinaryInputArchive
        .getArchive(new ByteArrayInputStream(qp.getData()));
        Record txn = SerializeUtils.deserializeTxn(ia, hdr);
        if (hdr.getZxid() != lastQueued + 1) {
            LOG.warn("Got zxid 0x"
                    + Long.toHexString(hdr.getZxid())
                    + " expected 0x"
                    + Long.toHexString(lastQueued + 1));
        }
        lastQueued = hdr.getZxid();
        fzk.logRequest(hdr, txn);
        break;
    case Leader.COMMIT:
        fzk.commit(qp.getZxid());
        break;
    case Leader.UPTODATE:
        fzk.takeSnapshot();
        self.cnxnFactory.setZooKeeperServer(fzk);
        break;
    case Leader.REVALIDATE:
        revalidate(qp);
        break;
    case Leader.SYNC:
        fzk.sync();
        break;
    }
}
{code}
Notice the different way the Follower treat the COMMIT and the UPTODATE 
packets. When receives a COMMIT packet, the follower will give this to a 
processor to deal with. But if receives a UPTODATE packet, the follower will 
take a snapshot immediately. So it is possible that the server will take 
snapshot before it commits all the operations it missed. Then if the server 
crashed again and recovered， it will recover its data from the snapshot, so the 
date inconsistent with the leader now, but its last zxid is the same. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (ZOOKEEPER-1118) Inconsistent data after server crashes several times

Reply via email to