Hi Alex, Hongchao, I agree with the assertion that we should favor correctness here.
Patrick On Mon, Mar 9, 2015 at 9:10 PM, Alexander Shraer <[email protected]> wrote: > Bumping this thread as it was brought up again by Hongchao in the context of > failing tests. > I agree with him that its much better to make sync a quorum operation, > however this may come in the expense > of slightly slower strong reads. I think this is acceptable since if someone > does "sync + read" he cares more about semantics > than latency. > > On Fri, Mar 1, 2013 at 7:06 PM, Alexander Shraer <[email protected]> wrote: >> >> This is an old thread (below), but it doesn't seem like any conclusion >> was reached on what we want to do to address the issue. >> >> Reminder of the problem: sync only gets you strong semantics if there >> is no leader change. If there is a leader change, >> then these semantics are guaranteed only if we make some timing >> assumptions, not made elsewhere in ZooKeeper. It would be much >> better not to make timing assumptions for such safety/consistency >> properties, only for liveness. >> >> The problem happens when your leader is no longer the leader but >> doesn't know it yet. He responds to a sync, but that doesn't mean >> you follower sees all committed state. Some other server may have >> already become the leader and committed some updates, which the sync >> won't flush to >> your follower, which is still connected to the old leader. >> >> To prevent this we should broadcast the sync like updates, or >> piggyback them on other ops, or perhaps create a new type of sync that >> is broadcasted. >> >> As Ben pointed out, this problem is also mentioned in Section 4.4 of >> the ZooKeeper peper (but the proposed solution there is insufficient >> to solve the issue, as >> discussed below). >> >> Alex >> >> >> On Fri, Sep 28, 2012 at 4:45 PM, John Carrino <[email protected]> >> wrote: >> > Ben, after thinking about this more. I don't think this solution gets >> > the >> > property that I need. Just because there are outstanding proposals that >> > are >> > committed later doesn't imply we are still the leader. It only means >> > that >> > when the new leader does recovery it will also see these proposals as >> > committed. >> > >> > Let's say we have a 5 node cluster and L1 has one pending request out. >> > F2-F5 >> > are followers. We get back an ack from F2. Now F5 and L1 are >> > partitioned >> > off from the network along with client C1. >> > >> > Recovery happens on F2-F4 and F2 becomes L2. During recovery this >> > proposal >> > is accepted because F2 had acked it. Now L2 does a bunch of stuff >> > including >> > deleting your ephemeral node. >> > >> > Now a sync comes in from C1 through F5. Now L1 finally gets that ack >> > from F5 >> > and goes ahead and commits it and responds to the outstanding sync >> > request >> > to C1. >> > >> > We can see with this ordering there isn't a happens after relationship >> > between the sync request and knowing about all commits that occurred >> > before >> > the sync request. >> > >> > Yes, I realize that this ordering is unlikely to happen in practice, but >> > I >> > hate trusting time for anything. >> > >> > -jc >> > >> > On Fri, Sep 28, 2012 at 7:31 AM, John Carrino <[email protected]> >> > wrote: >> >> >> >> This seems like a good compromise. We still have to eat the latency of >> >> a >> >> write, but we easily achieve smart batching in this case so many >> >> outstanding >> >> sync can all be serviced by the same lastPending request. >> >> >> >> -jc >> >> >> >> >> >> On Thu, Sep 27, 2012 at 11:17 PM, Benjamin Reed <[email protected]> >> >> wrote: >> >>> >> >>> there is a very easy solution to this. we only rely on clocks in the >> >>> case that there are no pending transactions. (if there are pending >> >>> transactions, the sync will only return if in fact the leader is still >> >>> the leader, otherwise the transaction that the sync is waiting on will >> >>> never commit and the sync will never return.) >> >>> >> >>> so, if there aren't any transactions, just submit one. make it a bogus >> >>> one: create / for example. then queue the sync behind it. >> >>> >> >>> ben >> >>> >> >>> ps - we bring up this issue and the solution and the rational for the >> >>> current implementation in section 4.4 of the zookeeper usenix paper. >> >>> >> >>> On Thu, Sep 27, 2012 at 9:57 AM, John Carrino <[email protected]> >> >>> wrote: >> >>> > So I think it's time to explain what I'm writing just so everyone >> >>> > has >> >>> > more >> >>> > situation awareness. Its just a timestamp server, nothing fancy. >> >>> > >> >>> > Looks like this: >> >>> > >> >>> > public interface TimestampService { >> >>> > /** >> >>> > * This will get a fresh timestamp that is guarenteed to be >> >>> > newer >> >>> > than >> >>> > any other timestamp >> >>> > * handed out before this method was called. >> >>> > */ >> >>> > long getFreshTimestamp(); >> >>> > } >> >>> > >> >>> > The only requirement is that the timestamp handed back is greater >> >>> > than >> >>> > every >> >>> > other timestamp that was returned before getFreshTs was called. >> >>> > There >> >>> > is no >> >>> > ordering requirement for concurrent requests. >> >>> > >> >>> > My impl is to reserve blocks of timestamps that are safe to hand out >> >>> > (1M at >> >>> > a time) using compare and swap in ZK. >> >>> > lastPossibleUsed = read(HighWater) >> >>> > safeToHandout = compareAndSwap(lastPossibleUsed, >> >>> > lastPossibleUsed+1M) >> >>> > >> >>> > Now my leader can hand back timestamps up to safeToHandout, but >> >>> > before >> >>> > it >> >>> > hands one out it must ensure it is still the leader (no one else has >> >>> > handed >> >>> > back something higher). >> >>> > I can use ensureQuorum(), exists(myEphemNode) to make sure this is >> >>> > the >> >>> > case. >> >>> > Now I have a service that is guarenteed to be correct, but doesn't >> >>> > require >> >>> > disk hits in the steady state which brings down my latency (if you >> >>> > get >> >>> > close >> >>> > to running out, you can compareAndSwap for more timestamps). >> >>> > >> >>> > If many requests come in at the same time I can use smart batching >> >>> > to >> >>> > verify >> >>> > happens after for all at once. We can also add more layers if we >> >>> > need >> >>> > more >> >>> > bandwidth to scale up at the cost of adding latency. Basically our >> >>> > latency >> >>> > will be O(lg(requestRate)) if we keep adding layers as each previous >> >>> > layer >> >>> > becomes saturated. >> >>> > >> >>> > I hope this explanation helps. I am busy for the next 4 hours, but >> >>> > if >> >>> > you >> >>> > need more clarification I can respond to them at that time. >> >>> > >> >>> > -jc >> >>> > >> >>> > >> >>> > On Thu, Sep 27, 2012 at 9:26 AM, John Carrino >> >>> > <[email protected]> >> >>> > wrote: >> >>> >> >> >>> >> First, thanks everyone for talking this through with me. >> >>> >> >> >>> >> Flavio, for your example, this is actually ok. There is a happens >> >>> >> after >> >>> >> relationship between the client making the request and my leader C1 >> >>> >> still >> >>> >> being the leader. My service only needs to guarantee that what it >> >>> >> hands >> >>> >> back is at least as new as anything that existed when the client >> >>> >> made >> >>> >> the >> >>> >> request. If C2 were to answer requests while C1 is stalling that >> >>> >> is >> >>> >> ok >> >>> >> because these would be considered concurrent requests and the stuff >> >>> >> returned >> >>> >> by C2 may be newer but that doesn't violate any guarentees. >> >>> >> >> >>> >> If some client were to get back something from C2 and then (happens >> >>> >> after >> >>> >> relationship) someone tried to read from C1, it needs to fail. >> >>> >> >> >>> >> To address your concern of adding too much bandwidth we can get >> >>> >> this >> >>> >> easily by doing what Martin Thompson calls smart batching >> >>> >> >> >>> >> (http://mechanical-sympathy.blogspot.com/2011/10/smart-batching.html). >> >>> >> >> >>> >> 1. ensureQuorum request comes in to L1 >> >>> >> 2. send ENSURE to all followers >> >>> >> 3. 10 more ensureQuorum requests come in >> >>> >> 4. get back ENSURE from quorum >> >>> >> 5. we can now service all 10 pending ensureQuorum requests with >> >>> >> another >> >>> >> round trip ENSURE. >> >>> >> >> >>> >> We don't need to send an ENSURE for every ensureQuorum request, we >> >>> >> just >> >>> >> need it to be happens after from when the request arrived. >> >>> >> >> >>> >> I am fine with the Ephemeral node being removed after some time >> >>> >> expires, >> >>> >> but only by the leader. If the leaders clock is broken and the >> >>> >> client >> >>> >> owning the Ephemeral node drops off, then we don't have liveness >> >>> >> (because >> >>> >> this node may not get cleaned up in a timely fashion). However, we >> >>> >> still >> >>> >> preserve corectness. >> >>> >> >> >>> >> -jc >> >>> >> >> >>> >> >> >>> >> On Thu, Sep 27, 2012 at 9:02 AM, Flavio Junqueira >> >>> >> <[email protected]> >> >>> >> wrote: >> >>> >>> >> >>> >>> Say that we implement what you're suggesting. Could you check if >> >>> >>> this >> >>> >>> scenario can happen: >> >>> >>> >> >>> >>> 1- Client C1 is the current leader and it super boosted read to >> >>> >>> make >> >>> >>> sure >> >>> >>> it is still the leader; >> >>> >>> 2- We process the super boosted read having it through the zab >> >>> >>> pipeline; >> >>> >>> 3- When we send the response to C1 we slow down the whole deal: >> >>> >>> the >> >>> >>> response to C1 gets delayed and we stall C1; >> >>> >>> 4- In the meanwhile, C1's session expires on the server side and >> >>> >>> its >> >>> >>> ephemeral leadership node is removed; >> >>> >>> 5- A new client C2 is elected and starts exercising leadership; >> >>> >>> 6- Now C1 comes back to normal and receives the response of the >> >>> >>> super >> >>> >>> boosted read saying that it is still the leader. >> >>> >>> >> >>> >>> If my interpretation is not incorrect, the only way to prevent >> >>> >>> this >> >>> >>> scenario from happening is if the session expires on the client >> >>> >>> side >> >>> >>> before >> >>> >>> it receives the response of the read. It doesn't look like we can >> >>> >>> do >> >>> >>> it if >> >>> >>> process clocks can be arbitrarily delayed. >> >>> >>> >> >>> >>> Note that one issue is that the behavior of ephemerals is highly >> >>> >>> dependent upon timers, so I don't think we can avoid making some >> >>> >>> timing >> >>> >>> assumptions altogether. The question is if we are better off with >> >>> >>> a >> >>> >>> mechanism relying upon acknowledgements. My sense is that >> >>> >>> application-level >> >>> >>> fencing is preferable (if not necessary) for applications like the >> >>> >>> ones JC >> >>> >>> is mentioning or BookKeeper. >> >>> >>> >> >>> >>> I'm not concerned about writes to disk, which I agree we don't >> >>> >>> need >> >>> >>> for >> >>> >>> sync. I'm more concerned about having it going through the whole >> >>> >>> pipeline, >> >>> >>> which will induce more traffic to zab and increase latency for an >> >>> >>> application that uses it heavily. >> >>> >>> >> >>> >>> -Flavio >> >>> >>> >> >>> >>> On Sep 27, 2012, at 5:27 PM, Alexander Shraer wrote: >> >>> >>> >> >>> >>> > another idea is to add this functionality to MultiOp - have read >> >>> >>> > only >> >>> >>> > transactions be replicated but not logged or logged >> >>> >>> > asynchronously. >> >>> >>> > I'm not sure how it works right now if I do a read-only MultiOp >> >>> >>> > transaction - does it replicate the transaction or answer it >> >>> >>> > locally >> >>> >>> > on the leader ? >> >>> >>> > >> >>> >>> > Alex >> >>> >>> > >> >>> >>> > On Thu, Sep 27, 2012 at 8:07 AM, Alexander Shraer >> >>> >>> > <[email protected]> >> >>> >>> > wrote: >> >>> >>> >> Thanks for the explanation. >> >>> >>> >> >> >>> >>> >> I guess one could always invoke a write operation instead of >> >>> >>> >> sync >> >>> >>> >> to >> >>> >>> >> get the more strict semantics, but as John suggests, it might >> >>> >>> >> be a >> >>> >>> >> good idea to add a new type of operation that requires >> >>> >>> >> followers >> >>> >>> >> to >> >>> >>> >> ack but doesn't require them to log to disk - this seems >> >>> >>> >> sufficient in >> >>> >>> >> our case. >> >>> >>> >> >> >>> >>> >> Alex >> >>> >>> >> >> >>> >>> >> On Thu, Sep 27, 2012 at 3:56 AM, Flavio Junqueira >> >>> >>> >> <[email protected]> >> >>> >>> >> wrote: >> >>> >>> >>> In theory, the scenario you're describing could happen, but I >> >>> >>> >>> would >> >>> >>> >>> argue that it is unlikely given that: 1) a leader pings >> >>> >>> >>> followers >> >>> >>> >>> twice a >> >>> >>> >>> tick to make sure that it has a quorum of supporters (lead()); >> >>> >>> >>> 2) >> >>> >>> >>> followers >> >>> >>> >>> give up on a leader upon catching an exception >> >>> >>> >>> (followLeader()). >> >>> >>> >>> One could >> >>> >>> >>> calibrate tickTime to make the probability of having this >> >>> >>> >>> scenario low. >> >>> >>> >>> >> >>> >>> >>> Let me also revisit the motivation for the way we designed >> >>> >>> >>> sync. >> >>> >>> >>> ZooKeeper has been designed to serve reads efficiently and >> >>> >>> >>> making >> >>> >>> >>> sync go >> >>> >>> >>> through the pipeline would slow down reads. Although optional, >> >>> >>> >>> we >> >>> >>> >>> thought it >> >>> >>> >>> would be a good idea to make it as efficient as possible to >> >>> >>> >>> comply with the >> >>> >>> >>> original expectations for the service. We consequently came up >> >>> >>> >>> with this >> >>> >>> >>> cheap way of making sure that a read sees all pending updates. >> >>> >>> >>> It >> >>> >>> >>> is correct >> >>> >>> >>> that there are some corner cases that it doesn't cover. One is >> >>> >>> >>> the case you >> >>> >>> >>> mentioned. Another is having the sync finishing before the >> >>> >>> >>> client >> >>> >>> >>> submits >> >>> >>> >>> the read and having a write committing in between. We rely >> >>> >>> >>> upon >> >>> >>> >>> the way we >> >>> >>> >>> implement timeouts and some minimum degree of synchrony for >> >>> >>> >>> the >> >>> >>> >>> clients when >> >>> >>> >>> submitting operations to guarantee that the scheme work. >> >>> >>> >>> >> >>> >>> >>> We thought about the option of having the sync operation going >> >>> >>> >>> through the pipeline, and in fact it would have been easier to >> >>> >>> >>> implement it >> >>> >>> >>> just as a regular write, but we opted not to because we felt >> >>> >>> >>> it >> >>> >>> >>> was >> >>> >>> >>> sufficient for the use cases we had and more efficient as I >> >>> >>> >>> already argued. >> >>> >>> >>> >> >>> >>> >>> Hope it helps to clarify. >> >>> >>> >>> >> >>> >>> >>> -Flavio >> >>> >>> >>> >> >>> >>> >>> On Sep 27, 2012, at 9:38 AM, Alexander Shraer wrote: >> >>> >>> >>> >> >>> >>> >>>> thanks for the explanation! but how do you avoid having the >> >>> >>> >>>> scenario >> >>> >>> >>>> raised by John ? >> >>> >>> >>>> lets say you're a client connected to F, and F is connected >> >>> >>> >>>> to >> >>> >>> >>>> L. >> >>> >>> >>>> Lets >> >>> >>> >>>> also say that L's pipeline >> >>> >>> >>>> is now empty, and both F and L are partitioned from 3 other >> >>> >>> >>>> servers >> >>> >>> >>>> in >> >>> >>> >>>> the system that have already >> >>> >>> >>>> elected a new leader L'. Now I go to L' and write something. >> >>> >>> >>>> L >> >>> >>> >>>> still >> >>> >>> >>>> thinks its the leader because the >> >>> >>> >>>> detection that followers left it is obviously timeout >> >>> >>> >>>> dependent. >> >>> >>> >>>> So >> >>> >>> >>>> when F sends your sync to L and L returns >> >>> >>> >>>> it to F, you actually miss my write! >> >>> >>> >>>> >> >>> >>> >>>> Alex >> >>> >>> >>>> >> >>> >>> >>>> On Thu, Sep 27, 2012 at 12:32 AM, Flavio Junqueira >> >>> >>> >>>> <[email protected]> wrote: >> >>> >>> >>>>> Hi Alex, Because of the following: >> >>> >>> >>>>> >> >>> >>> >>>>> 1- A follower F processes operations from a client in FIFO >> >>> >>> >>>>> order, >> >>> >>> >>>>> and say that a client submits as you say sync + read; >> >>> >>> >>>>> 2- A sync will be processed by the leader and returned to >> >>> >>> >>>>> the >> >>> >>> >>>>> follower. It will be queued after all pending updates that >> >>> >>> >>>>> the >> >>> >>> >>>>> follower >> >>> >>> >>>>> hasn't processed; >> >>> >>> >>>>> 3- The follower will process all pending updates before >> >>> >>> >>>>> processing >> >>> >>> >>>>> the response of the sync; >> >>> >>> >>>>> 4- Once the follower processes the sync, it picks the read >> >>> >>> >>>>> operation to process. It reads the local state of the >> >>> >>> >>>>> follower >> >>> >>> >>>>> and returns >> >>> >>> >>>>> to the client. >> >>> >>> >>>>> >> >>> >>> >>>>> When we process the read in Step 4, we have applied all >> >>> >>> >>>>> pending >> >>> >>> >>>>> updates the leader had for the follower by the time the read >> >>> >>> >>>>> request >> >>> >>> >>>>> started. >> >>> >>> >>>>> >> >>> >>> >>>>> This implementation is a bit of a hack because it doesn't >> >>> >>> >>>>> follow >> >>> >>> >>>>> the same code path as the other operations that go to the >> >>> >>> >>>>> leader, but it >> >>> >>> >>>>> avoids some unnecessary steps, which is important for fast >> >>> >>> >>>>> reads. In the >> >>> >>> >>>>> sync case, the other followers don't really need to know >> >>> >>> >>>>> about >> >>> >>> >>>>> it (there is >> >>> >>> >>>>> nothing to be updated) and the leader simply inserts it in >> >>> >>> >>>>> the >> >>> >>> >>>>> sequence of >> >>> >>> >>>>> updates of F, ordering it. >> >>> >>> >>>>> >> >>> >>> >>>>> -Flavio >> >>> >>> >>>>> >> >>> >>> >>>>> On Sep 27, 2012, at 9:12 AM, Alexander Shraer wrote: >> >>> >>> >>>>> >> >>> >>> >>>>>> Hi Flavio, >> >>> >>> >>>>>> >> >>> >>> >>>>>>> Starting a read operation concurrently with a sync implies >> >>> >>> >>>>>>> that >> >>> >>> >>>>>>> the result of the read will not miss an update committed >> >>> >>> >>>>>>> before the read >> >>> >>> >>>>>>> started. >> >>> >>> >>>>>> >> >>> >>> >>>>>> I thought that the intention of sync was to give something >> >>> >>> >>>>>> like >> >>> >>> >>>>>> linearizable reads, so if you invoke a sync and then a >> >>> >>> >>>>>> read, >> >>> >>> >>>>>> your >> >>> >>> >>>>>> read >> >>> >>> >>>>>> is guaranteed to (at least) see any write which completed >> >>> >>> >>>>>> before >> >>> >>> >>>>>> the >> >>> >>> >>>>>> sync began. Is this the intention ? If so, how is this >> >>> >>> >>>>>> achieved >> >>> >>> >>>>>> without running agreement on the sync op ? >> >>> >>> >>>>>> >> >>> >>> >>>>>> Thanks, >> >>> >>> >>>>>> Alex >> >>> >>> >>>>>> >> >>> >>> >>>>>> On Thu, Sep 27, 2012 at 12:05 AM, Flavio Junqueira >> >>> >>> >>>>>> <[email protected]> wrote: >> >>> >>> >>>>>>> sync simply flushes the channel between the leader and the >> >>> >>> >>>>>>> follower that forwarded the sync operation, so it doesn't >> >>> >>> >>>>>>> go >> >>> >>> >>>>>>> through the >> >>> >>> >>>>>>> full zab pipeline. Flushing means that all pending updates >> >>> >>> >>>>>>> from the leader >> >>> >>> >>>>>>> to the follower are received by the time sync completes. >> >>> >>> >>>>>>> Starting a read >> >>> >>> >>>>>>> operation concurrently with a sync implies that the result >> >>> >>> >>>>>>> of >> >>> >>> >>>>>>> the read will >> >>> >>> >>>>>>> not miss an update committed before the read started. >> >>> >>> >>>>>>> >> >>> >>> >>>>>>> -Flavio >> >>> >>> >>>>>>> >> >>> >>> >>>>>>> On Sep 27, 2012, at 3:43 AM, Alexander Shraer wrote: >> >>> >>> >>>>>>> >> >>> >>> >>>>>>>> Its strange that sync doesn't run through agreement, I >> >>> >>> >>>>>>>> was >> >>> >>> >>>>>>>> always >> >>> >>> >>>>>>>> assuming that it is... Exactly for the reason you say - >> >>> >>> >>>>>>>> you may trust your leader, but I may have a different >> >>> >>> >>>>>>>> leader >> >>> >>> >>>>>>>> and >> >>> >>> >>>>>>>> your >> >>> >>> >>>>>>>> leader may not detect it yet and still think its the >> >>> >>> >>>>>>>> leader. >> >>> >>> >>>>>>>> >> >>> >>> >>>>>>>> This seems like a bug to me. >> >>> >>> >>>>>>>> >> >>> >>> >>>>>>>> Similarly to Paxos, Zookeeper's safety guarantees don't >> >>> >>> >>>>>>>> (or >> >>> >>> >>>>>>>> shouldn't) >> >>> >>> >>>>>>>> depend on timing assumption. >> >>> >>> >>>>>>>> Only progress guarantees depend on time. >> >>> >>> >>>>>>>> >> >>> >>> >>>>>>>> Alex >> >>> >>> >>>>>>>> >> >>> >>> >>>>>>>> >> >>> >>> >>>>>>>> On Wed, Sep 26, 2012 at 4:41 PM, John Carrino >> >>> >>> >>>>>>>> <[email protected]> wrote: >> >>> >>> >>>>>>>>> I have some pretty strong requirements in terms of >> >>> >>> >>>>>>>>> consistency >> >>> >>> >>>>>>>>> where >> >>> >>> >>>>>>>>> reading from followers that may be behind in terms of >> >>> >>> >>>>>>>>> updates >> >>> >>> >>>>>>>>> isn't ok for >> >>> >>> >>>>>>>>> my use case. >> >>> >>> >>>>>>>>> >> >>> >>> >>>>>>>>> One error case that worries me is if a follower and >> >>> >>> >>>>>>>>> leader >> >>> >>> >>>>>>>>> are >> >>> >>> >>>>>>>>> partitioned >> >>> >>> >>>>>>>>> off from the network. A new leader is elected, but the >> >>> >>> >>>>>>>>> follower and old >> >>> >>> >>>>>>>>> leader don't know about it. >> >>> >>> >>>>>>>>> >> >>> >>> >>>>>>>>> Normally I think sync was made for this purpost, but I >> >>> >>> >>>>>>>>> looked >> >>> >>> >>>>>>>>> at the sync >> >>> >>> >>>>>>>>> code and if there aren't any outstanding proposals the >> >>> >>> >>>>>>>>> leader >> >>> >>> >>>>>>>>> sends the >> >>> >>> >>>>>>>>> sync right back to the client without first verifying >> >>> >>> >>>>>>>>> that >> >>> >>> >>>>>>>>> it >> >>> >>> >>>>>>>>> still has >> >>> >>> >>>>>>>>> quorum, so this won't work for my use case. >> >>> >>> >>>>>>>>> >> >>> >>> >>>>>>>>> At the core of the issue all I really need is a call >> >>> >>> >>>>>>>>> that >> >>> >>> >>>>>>>>> will >> >>> >>> >>>>>>>>> make it's >> >>> >>> >>>>>>>>> way to the leader and will ping it's followers, ensure >> >>> >>> >>>>>>>>> it >> >>> >>> >>>>>>>>> still >> >>> >>> >>>>>>>>> has a >> >>> >>> >>>>>>>>> quorum and return success. >> >>> >>> >>>>>>>>> >> >>> >>> >>>>>>>>> Basically a getCurrentLeaderEpoch() method that will be >> >>> >>> >>>>>>>>> forwarded to the >> >>> >>> >>>>>>>>> leader, leader will ensure it still has quorum and >> >>> >>> >>>>>>>>> return >> >>> >>> >>>>>>>>> it's >> >>> >>> >>>>>>>>> epoch. I >> >>> >>> >>>>>>>>> can use this primitive to implement all the other >> >>> >>> >>>>>>>>> properties I >> >>> >>> >>>>>>>>> want to >> >>> >>> >>>>>>>>> verify (assuming that my client will never connect to an >> >>> >>> >>>>>>>>> older >> >>> >>> >>>>>>>>> epoch after >> >>> >>> >>>>>>>>> this call returns). Also the nice thing about this >> >>> >>> >>>>>>>>> method >> >>> >>> >>>>>>>>> is >> >>> >>> >>>>>>>>> that it will >> >>> >>> >>>>>>>>> not have to hit disk and the latency should just be a >> >>> >>> >>>>>>>>> round >> >>> >>> >>>>>>>>> trip to the >> >>> >>> >>>>>>>>> followers. >> >>> >>> >>>>>>>>> >> >>> >>> >>>>>>>>> Most of the guarentees offered by zookeeper are time >> >>> >>> >>>>>>>>> based >> >>> >>> >>>>>>>>> an >> >>> >>> >>>>>>>>> rely on >> >>> >>> >>>>>>>>> clocks and expiring timers, but I'm hoping to offer some >> >>> >>> >>>>>>>>> guarantees in >> >>> >>> >>>>>>>>> spite of busted clocks, horrible GC perf, VM suspends >> >>> >>> >>>>>>>>> and >> >>> >>> >>>>>>>>> any >> >>> >>> >>>>>>>>> other way >> >>> >>> >>>>>>>>> time is broken. >> >>> >>> >>>>>>>>> >> >>> >>> >>>>>>>>> Also if people are interested I can go into more detail >> >>> >>> >>>>>>>>> about >> >>> >>> >>>>>>>>> what I am >> >>> >>> >>>>>>>>> trying to write. >> >>> >>> >>>>>>>>> >> >>> >>> >>>>>>>>> -jc >> >>> >>> >>>>>>> >> >>> >>> >>>>> >> >>> >>> >>> >> >>> >>> >> >>> >> >> >>> > >> >> >> >> >> > > >
