sorry, i'm behind on my email. you are correct :) ben
On Mon, Mar 28, 2011 at 1:03 PM, Fournier, Camille F. [Tech] < [email protected]> wrote: > I take that back. Right after the UPTODATE send in LearnerHandler, we wait > for the final ACK from that follower and call processAck on that packet. We > need that ack set to reach a quorum set before we start up the Leader > ZooKeeperServer. Until that is started, we won’t process REVALIDATE requests > and we won’t accept connections ourselves (so clients can’t connect to us to > revalidate their session). So I think we are ok. > > > > C > > > > *From:* Fournier, Camille F. [Tech] > *Sent:* Monday, March 28, 2011 3:34 PM > *To:* '[email protected]' > *Subject:* RE: send UPTODATE to follower until a quorum of servers synced > with leader > > > > Looking at the code it looks like we don’t need a synched quorum to accept > a new client session, just a quorum in the process of synching, so I don’t > think the session handling will solve this. I suppose it’s a warning that > correctness for n=3 doesn’t extend to all possible cluster sizes of N. > > Definitely worth opening a JIRA. > > > > C > > > > *From:* Flavio Junqueira [mailto:[email protected]] > *Sent:* Monday, March 28, 2011 11:49 AM > *To:* [email protected] > *Subject:* Re: send UPTODATE to follower until a quorum of servers synced > with leader > > > > Hi Jiangwen, Good catch. I followed the code and it does sound like this > scenario can happen, ignoring how sessions are handled. I checked that a > follower takes a snapshot and starts a zookeeper server right after > receiving an UPTODATE message. I'm not clear, though, if it is possible for > a client to revalidate a session while the leader hasn't started. I was > discussing with Ben offline and it sounds like we do not necessarily wait > for a leader to come up to revalidate sessions. I'm not so familiar with the > session handling part of the code, so I'll let perhaps Ben or someone else > add to this discussion. > > > > In any case, you might want to open a jira to track our comments so that we > don't miss important comments. I also wanted to point out that we have been > observing a few corner cases like the one you raised, and we have been > designing changes to the implementation that take care of such problems. If > I'm not mistaken, the scenario you point out wouldn't happen under our > changes because followers would wait for a commit message (wait for a quorum > to ack) before starting a server, as you point out. The latest notes on the > design are under Zab1.0 in the ZooKeeper wiki. > > > > Thanks, > > -Flavio > > > > > > On Mar 28, 2011, at 10:24 AM, jiangwen w wrote: > > > > 1. current process > when leader fail, a new leader will be elected, followers will sync with > the > new leader. > After synced, leader send UPTODATE to follower. > > 2. a corner case > but there is a corner case, things will go wrong. > suppose message M only exists on leader, after a follower synced with > leader, the client connected to the follower will see M. > but it only exists on two servers, not on a quorum of servers. If the new > leader and the follower failed, message M is lost, but M is already seen by > client. > > 3. one solution > So I think UPTODATE can be sent to follower only when a quorum of server > synced with the leader. > > Sincerely > > > > *flavio* > *junqueira* > > research scientist > > [email protected] > direct +34 93-183-8828 > > avinguda diagonal 177, 8th floor, barcelona, 08018, es > phone (408) 349 3300 fax (408) 349 3301 > > > >
