Nicholas Wolchko created ZOOKEEPER-2418:
-------------------------------------------

             Summary: txnlog diff sync can skip sending some transactions to 
followers
                 Key: ZOOKEEPER-2418
                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2418
             Project: ZooKeeper
          Issue Type: Bug
          Components: server
    Affects Versions: 3.5.1
            Reporter: Nicholas Wolchko
            Priority: Critical


If the leader is having disk issues so that its on disk txnlog is behind the in 
memory commit log, it will send a DIFF that is missing the transactions in 
between the two.

Example:
There are 5 hosts in the cluster. 1 is the leader. 5 is disconnected.
We commit up to zxid 1000.
At zxid 450, the leader's disk stalls, but we still commit transactions because 
2,3,4 are up and acking writes.
At zxid 1000, the txnlog on the leader has 1-450 and the commit log has 
500-1000.
Then host 5 regains its connection to the cluster and syncs with the leader. It 
will receive a DIFF containing zxids 1-450 and 500-1000.

This is because queueCommittedProposals in the LearnerHandler just queues 
everything within its zxid range. It doesn't give an error if there is a gap 
between peerLastZxid and the iterator it is queueing from.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to