[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15265345#comment-15265345
 ] 

Flavio Junqueira commented on ZOOKEEPER-2418:
---------------------------------------------

Got it, I think you're referring to this logic in {{LearnerHandler}}:

{noformat}
                    LOG.info("Use txnlog and committedLog for peer sid: " +  
getSid());
                    currentZxid = queueCommittedProposals(txnLogItr, 
peerLastZxid,
                                                         minCommittedLog, 
maxCommittedLog);

                    LOG.debug("Queueing committedLog 0x" + 
Long.toHexString(currentZxid));
                    Iterator<Proposal> committedLogItr = 
db.getCommittedLog().iterator();
                    currentZxid = queueCommittedProposals(committedLogItr, 
currentZxid,
                                                         null, maxCommittedLog);
{noformat}

We are currently assuming that the two sequences (txn log and committed log) 
overlap, but as your example rightfully shows, it may not be the case. There 
are three options I see here:

# Error the attempt to join the follower
# Block new commits until the leader catches up
# Leader drops leadership

The first option is better than having a gap, but I'm concerned that the leader 
might end up with a gap for a long time and may keep dropping followers. The 
second forces the leader to catch up, but it may be desirable to use the third 
in the case the leader is lagging behind. Having the leader slowing down the 
ensemble is not ideal. 


> txnlog diff sync can skip sending some transactions to followers
> ----------------------------------------------------------------
>
>                 Key: ZOOKEEPER-2418
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2418
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.5.1
>            Reporter: Nicholas Wolchko
>            Assignee: Nicholas Wolchko
>            Priority: Critical
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> If the leader is having disk issues so that its on disk txnlog is behind the 
> in memory commit log, it will send a DIFF that is missing the transactions in 
> between the two.
> Example:
> There are 5 hosts in the cluster. 1 is the leader. 5 is disconnected.
> We commit up to zxid 1000.
> At zxid 450, the leader's disk stalls, but we still commit transactions 
> because 2,3,4 are up and acking writes.
> At zxid 1000, the txnlog on the leader has 1-450 and the commit log has 
> 500-1000.
> Then host 5 regains its connection to the cluster and syncs with the leader. 
> It will receive a DIFF containing zxids 1-450 and 500-1000.
> This is because queueCommittedProposals in the LearnerHandler just queues 
> everything within its zxid range. It doesn't give an error if there is a gap 
> between peerLastZxid and the iterator it is queueing from.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to