Fangmin Lv created ZOOKEEPER-2845:
-------------------------------------

             Summary: Data inconsistency issue due to retain database in leader 
election
                 Key: ZOOKEEPER-2845
                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2845
             Project: ZooKeeper
          Issue Type: Bug
          Components: quorum
    Affects Versions: 3.5.3, 3.4.10
            Reporter: Fangmin Lv
            Assignee: Fangmin Lv
            Priority: Critical


In ZOOKEEPER-2678, the ZKDatabase is retained to reduce the unavailable time 
during leader election. In ZooKeeper ensemble, it's possible that the snapshot 
is ahead of txn file (due to slow disk on the server, etc), or the txn file is 
ahead of snapshot due to no commit message being received yet. 

If snapshot is ahead of txn file, since the SyncRequestProcessor queue will be 
drained during shutdown, the snapshot and txn file will keep consistent before 
leader election happening, so this is not an issue.

But if txn is ahead of snapshot, it's possible that the ensemble will have data 
inconsistent issue, here is the simplified scenario to show the issue:

Let's say we have a 3 servers in the ensemble, server A and B are followers, 
and C is leader, and all the snapshot and txn are up to T0:
1. A new request reached to leader C to create Node N, and it's converted to 
txn T1 
2. Txn T1 was synced to disk in C, but just before the proposal reaching out to 
the followers, A and B restarted, so the T1 didn't exist in A and B
3. A and B formed a new quorum after restart, let's say B is the leader
4. C changed to looking state due to no enough followers, it will sync with 
leader B with last Zxid T0, which will have an empty diff sync
5. Before C take snapshot it restarted, it replayed the txns on disk which 
includes T1, now it will have Node N, but A and B doesn't have it.

Also I included the a test case to reproduce this issue consistently. 

We have a totally different RetainDB version which will avoid this issue by 
doing consensus between snapshot and txn files before leader election, will 
submit for review.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to