This pretty much matches what I expect. It would be great if you wanted to try your hand at creating a patch and submitting it to the ticket that was created for this problem, but if not, please post this analysis to issue 1465 and we'll look at it ASAP.
C On Wed, May 16, 2012 at 2:55 PM, Vinayak Khot <[email protected]> wrote: > We also have encountered a problem where the newly elected leader > sends entire > snapshot to a follower even though the follower is in sync with the leader. > > A closer look at the code shows the problem in the logic where we decide to > send > a snapshot. > Following scenario explains the problem in details. > Start a 3 node Zookeeper ensemble where every quorum member has seen same > changes. > zxid: *0x400000004* > > 1. When a newly elected leader starts, it bumps up its zxid to the new > epoch. > > Code snippet Leader.java > > long epoch = getEpochToPropose(self.getId(), self.getAcceptedEpoch()); > zk.setZxid(ZxidUtils.makeZxid(epoch, 0)); > synchronized(this){ > lastProposed = zk.getZxid(); // *0x500000000* > } > > 2. Now a follower tries to join the leader with its peerLastZxid = * > 0x400000004* > > Note that now the leader has in memory committedLog list with* * > maxCommittedLog=*0x400000004** * > * > * > As committedLog don't have any new transactions which have zxid > > peerLastZxid, we check if > the leader and follower are in sync. > > Code snippet from LearnerHandler.java > leaderLastZxid = leader.startForwarding(this, updates); > if (peerLastZxid == leaderLastZxid) { *0x400000004 == **0x500000000* > // We are in sync so we'll do an empty diff > packetToSend = Leader.DIFF; > zxidToSend = leaderLastZxid; > } > > Note that the function *leader.startForwarding()* returns *lastProposed *zxid > which is already set to > *0x500000000 *by the leader. > So in this scenario we never send empty diff even though the leader and > follower are in sync, > and we end up sending entire snapshot in the code that follows above check. > > A possible fix would be to keep *lastProcessedZxid* in the leader which > will get updated only when > the leader processes a transaction. While syncing with a follower, if the > peerLastZxid sent by a follower > is same as lastProcessedZxid of the leader we can send empty diff to the > follower. > This shall avoid unnecessarily sending entire snapshot when the leader and > follower are already in sync. > > Zookeeper developers please share your views on above mentioned issue. > > - Vinayak > > On Mon, May 14, 2012 at 8:30 AM, Camille Fournier <[email protected]>wrote: > >> Thanks. >> I just ran a couple of tests to start the debugging. Mark, I don't see >> a long cluster settle with a mostly empty data set, so I think this >> might be two different problems. I do see a lot of snapshots being >> sent though so there is probably some overaggressiveness in the way >> that we evaluate when to send snapshots that should be evaluated. >> Adding the dev mailing list, as I may need ben or flavio to take a >> look as well. >> >> C >> >> On Thu, May 10, 2012 at 10:48 AM, <[email protected]> wrote: >> > Cheers - Raised https://issues.apache.org/jira/browse/ZOOKEEPER-1465 >> > >> > >> > >> > -----Original Message----- >> > From: Camille Fournier [mailto:[email protected]] >> > Sent: 10 May 2012 14:58 >> > To: [email protected] >> > Subject: Re: Possible issue with cluster availability following new >> Leader Election - ZK 3.4 >> > >> > I will take a look at this soon, have you created a Jira for it? If not >> please do so. >> > >> > Thanks, >> > C >> > >> > On Thu, May 10, 2012 at 7:20 AM, <[email protected]> wrote: >> >> I think there may be a problem here with the 3.4 branch. I dropped the >> >> cluster back to 3.3.5 and the behaviour was much better. >> >> >> >> To summarize: >> >> >> >> 650mb of data >> >> 20k nodes of varied size >> >> 3 node cluster >> >> >> >> On 3.4.x (using latest branch build) >> >> --------- >> >> Takes 3-4 minutes to bring up a cluster from cold Takes 40-50 secs to >> >> recover from a leader failure Takes 10 secs for a new follower to join >> >> the cluster >> >> >> >> On 3.3.5 >> >> -------- >> >> Takes 10-20 secs to bring up a cluster from cold Takes 10 secs to >> >> recover from a leader failure Takes 10 secs for a new follower to join >> >> the cluster >> >> >> >> Any views on this from the ZK devs? The differences in behaviour only >> >> start becoming apparent as the dataset gets bigger. >> >> I was hoping to use 3.4 for the transactional features it offered via >> >> the 'multi-update' operations, but this issue seems pretty serious... >> >> >> >> >> >> >> >> Visit our website at http://www.ubs.com >> >> >> >> This message contains confidential information and is intended only >> >> for the individual named. If you are not the named addressee you >> >> should not disseminate, distribute or copy this e-mail. Please notify >> >> the sender immediately by e-mail if you have received this e-mail by >> >> mistake and delete this e-mail from your system. >> >> >> >> E-mails are not encrypted and cannot be guaranteed to be secure or >> >> error-free as information could be intercepted, corrupted, lost, >> >> destroyed, arrive late or incomplete, or contain viruses. The sender >> >> therefore does not accept liability for any errors or omissions in the >> >> contents of this message which arise as a result of e-mail transmission. >> >> If verification is required please request a hard-copy version. This >> >> message is provided for informational purposes and should not be >> >> construed as a solicitation or offer to buy or sell any securities or >> >> related financial instruments. >> >> >> >> UBS Limited is a company limited by shares incorporated in the United >> >> Kingdom registered in England and Wales with number 2035362. >> >> Registered office: 1 Finsbury Avenue, London EC2M 2PP. UBS Limited is >> >> authorised and regulated by the Financial Services Authority. >> >> >> >> UBS AG is a public company incorporated with limited liability in >> >> Switzerland domiciled in the Canton of Basel-City and the Canton of >> >> Zurich respectively registered at the Commercial Registry offices in >> >> those Cantons with Identification No: CH-270.3.004.646-4 and having >> >> respective head offices at Aeschenvorstadt 1, 4051 Basel and >> >> Bahnhofstrasse 45, 8001 Zurich, Switzerland. Registered in the United >> >> Kingdom as a foreign company with No: FC021146 and having a UK >> >> Establishment registered at Companies House, Cardiff, with No: >> >> BR 004507. The principal office of UK Establishment: 1 Finsbury >> >> Avenue, London EC2M 2PP. In the United Kingdom, UBS AG is authorised >> >> and regulated by the Financial Services Authority. >> >> >> >> UBS reserves the right to retain all messages. Messages are protected >> >> and accessed only in legally justified cases. >> > Visit our website at http://www.ubs.com >> > >> > This message contains confidential information and is intended only >> > for the individual named. If you are not the named addressee you >> > should not disseminate, distribute or copy this e-mail. Please >> > notify the sender immediately by e-mail if you have received this >> > e-mail by mistake and delete this e-mail from your system. >> > >> > E-mails are not encrypted and cannot be guaranteed to be secure or >> > error-free as information could be intercepted, corrupted, lost, >> > destroyed, arrive late or incomplete, or contain viruses. The sender >> > therefore does not accept liability for any errors or omissions in the >> > contents of this message which arise as a result of e-mail transmission. >> > If verification is required please request a hard-copy version. This >> > message is provided for informational purposes and should not be >> > construed as a solicitation or offer to buy or sell any securities >> > or related financial instruments. >> > >> > UBS Limited is a company limited by shares incorporated in the United >> > Kingdom registered in England and Wales with number 2035362. >> > Registered office: 1 Finsbury Avenue, London EC2M 2PP. UBS Limited >> > is authorised and regulated by the Financial Services Authority. >> > >> > UBS AG is a public company incorporated with limited liability in >> > Switzerland domiciled in the Canton of Basel-City and the Canton of >> > Zurich respectively registered at the Commercial Registry offices in >> > those Cantons with Identification No: CH-270.3.004.646-4 and having >> > respective head offices at Aeschenvorstadt 1, 4051 Basel and >> > Bahnhofstrasse 45, 8001 Zurich, Switzerland. Registered in the >> > United Kingdom as a foreign company with No: FC021146 and having a >> > UK Establishment registered at Companies House, Cardiff, with No: >> > BR 004507. The principal office of UK Establishment: 1 Finsbury Avenue, >> > London EC2M 2PP. In the United Kingdom, UBS AG is authorised and >> > regulated by the Financial Services Authority. >> > >> > UBS reserves the right to retain all messages. Messages are protected >> > and accessed only in legally justified cases. >>
