[
https://issues.apache.org/jira/browse/ZOOKEEPER-962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12978043#action_12978043
]
Hadoop QA commented on ZOOKEEPER-962:
-------------------------------------
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12467595/ZOOKEEPER-962.patch
against trunk revision 1053497.
+1 @author. The patch does not contain any @author tags.
+1 tests included. The patch appears to include 6 new or modified tests.
+1 javadoc. The javadoc tool did not generate any warning messages.
+1 javac. The applied patch does not increase the total number of javac
compiler warnings.
-1 findbugs. The patch appears to introduce 10 new Findbugs (version
1.3.9) warnings.
-1 release audit. The applied patch generated 25 release audit warnings
(more than the trunk's current 24 warnings).
+1 core tests. The patch passed core unit tests.
+1 contrib tests. The patch passed contrib unit tests.
Test results:
https://hudson.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/82//testReport/
Release audit warnings:
https://hudson.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/82//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt
Findbugs warnings:
https://hudson.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/82//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output:
https://hudson.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/82//console
This message is automatically generated.
> leader/follower coherence issue when follower is receiving a DIFF
> -----------------------------------------------------------------
>
> Key: ZOOKEEPER-962
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-962
> Project: ZooKeeper
> Issue Type: Bug
> Components: server
> Affects Versions: 3.3.2
> Reporter: Camille Fournier
> Assignee: Camille Fournier
> Priority: Critical
> Fix For: 3.3.3, 3.4.0
>
> Attachments: ZOOKEEPER-962.patch
>
>
> From mailing list:
> It seems like we rely on the LearnerHandler thread startup to capture all of
> the missing committed
> transactions in the SNAP or DIFF, but I don't see anything (especially in the
> DIFF case) that
> is preventing us for committing more transactions before we actually start
> forwarding updates
> to the new follower.
> Let me explain using my example from ZOOKEEPER-919. Assume we have quorum
> already, so the
> leader can be processing transactions while my follower is starting up.
> I'm a follower at zxid N-5, the leader is at N. I send my FOLLOWERINFO packet
> to the leader
> with that information. The leader gets the proposals from its committed log
> (time T1), then
> syncs on the proposal list (LearnerHandler line 267. Why? It's a copy of the
> underlying proposal
> list... this might be part of our problem). I check to see if the
> peerLastZxid is within my
> max and min committed log and it is, so I'm going to send a diff. I set the
> zxidToSend to
> be the maxCommittedLog at time T3 (we already know this is sketchy), and
> forward the proposals
> from my copied proposal list starting at the peerLastZxid+1 up to the last
> proposal transaction
> (as seen at time T1).
> After I have queued up all those diffs to send, I tell the leader to
> startFowarding updates
> to this follower (line 308).
> So, let's say that at time T2 I actually swap out the leader to the thread
> that is handling
> the various request processors, and see that I got enough votes to commit
> zxid N+1. I commit
> N+1 and so my maxCommittedLog at T3 is N+1, but this proposal is not in the
> list of proposals
> that I got back at time T1, so I don't forward this diff to the client.
> Additionally, I processed
> the commit and removed it from my leader's toBeApplied list. So when I call
> startForwarding
> for this new follower, I don't see this transaction as a transaction to be
> forwarded.
> There's one problem. Let's also imagine, however, that I commit N+1 at time
> T4. The maxCommittedLog
> value is consistent with the max of the diff packets I am going to send the
> follower. But,
> I still committed N+1 and removed it from the toBeApplied list before calling
> startFowarding
> with this follower. How does the follower get this transaction? Does it?
> To put it another way, here is the thread interaction, hopefully formatted so
> you can read
> it...
> LearnerHandlerThread
> RequestProcessorThread
> T1(LH): get list of proposals (COPY)
> T2(RPT): commit
> N+1, remove from toBeApplied
> T3(LH): get maxCommittedLog
> T4(LH): send diffs from view at T1
> T5(LH): startForwarding
> Or
> T1(LH): get list of proposals (COPY)
> T2(LH): get maxCommittedLog
> T3(RPT): commit
> N+1, remove from toBeApplied
> T4(LH): send diffs from view at T1
> T5(LH): startFowarding
> I'm trying to figure out what, if anything, keeps the requests from being
> committed, removed,
> and never seen by the follower before it fully starts up.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.