Yeah, I agree with Camille, there seems to be some sensitiveness around the
topic, even though this thread seems to be a simple exercise of understanding
what is out there.
Performing periodic verifications of the txn log CRCs enables one to detect
latent errors earlier, but I'm not sure what you gain by spotting it earlier.
Would you use it to recover the corrupt state by rewriting the corrupt
record(s)? Would you use it to trigger a snapshot round? Or would you simply
make your user unhappy sooner?
On the mock network partition bit, Pat Hunt has proposed a while back that we
introduced a fault-injection framework so that we could perform tests like the
ones you described. Later on Andrei Savu made a similar proposal (see ZK-1364
for context), but that's still open. It'd be a good thing to have this feature,
and note that this broader than just network partition.
-Flavio
On Monday, April 13, 2015 3:07 AM, Camille Fournier <[email protected]>
wrote:
Thanks for the information, appreciate the clarifications.
>From my point of view, no one is obsessing over who is better and I'm not
sure where you got the idea that this was a competition. Perhaps you've
misread the tone of the emails but I'm just trying to understand how we can
better address the needs of the community at large.
Thanks
C
On Apr 12, 2015 9:17 PM, "Hongchao Deng" <[email protected]> wrote:
> For the reliability, ZooKeeper runs practically more stable and robust.
> Xiang Li from the etcd has asked often asked me questions about ZooKeeper
> JIRAs and code. He made some designs based on the lessons learned from ZK.
> One design I think very useful is the CRC. Basically, every index (zxid)
> saves CRC in the log. Cluster checks periodically the CRC for specific
> index (zxid). For example, when taking a snapshot, it checks the CRC for
> snapshot index (zxid) cluster-wide. In this way, it's very likely to detect
> any data inconsistency bug.
> One another is mock network partition. It's good to have a test network
> interface to do packet delay, partition, injection, etc.
> https://github.com/coreos/etcd/blob/279b216f9a2ba758dae6522db0d22a2a8f628c15/raft/rafttest/network.go#L20-L30This
> is quite useful because the hard problems found on JIRAs happens in
> partitions. And it's easier to write unit test when such a bug is found.
> It's all open source and you can learn from the code. Open source is not a
> "who's the better" fight, but making the code better every day. And it's
> BAD to be obsessive in comparing etcd and ZooKeeper.
> - Hongchao Deng
>
> > Date: Sun, 12 Apr 2015 20:01:14 -0400
> > Subject: RE: design thoughts: node TTLs
> > From: [email protected]
> > To: [email protected]
> >
> > Hongchao,
> > Instead of telling us we're misunderstanding things it would be great if
> > you tried to explain them. I don't think anyone here is trying to
> > misunderstand but we don't all have the time to become experts in
> multiple
> > systems.
> > Beyond that, what is it about etcd that makes them better from your
> > perspective with respect to reliability and robustness?
> >
> > Thanks
> > C
> > On Apr 12, 2015 7:05 PM, "Hongchao Deng" <[email protected]>
> wrote:
> >
> > >
> > >
> > >
> > > I believe the etcd index and TTL are the major feature behind lease
> > > implementation. If you think that's a great REST feature, you probably
> > > misunderstanding it.
> > > I think what we should take from etcd is reliability and robustness.
> How
> > > can we better prevent data inconsistency when new features being added
> --
> > > CRC, mock network partition? After that, it's scalability, like the
> proxy
> > > (or better observer) I described.
> > > With reliability and scalability eased at mind, building a REST thing
> on
> > > top is a great add-on.
> > > - Hongchao Deng
> > >
> > > > Date: Sat, 11 Apr 2015 15:26:29 -0400
> > > > Subject: Re: design thoughts: node TTLs
> > > > From: [email protected]
> > > > To: [email protected]
> > > > CC: [email protected]; [email protected]
> > > >
> > > > Ah here we go I had to dig, from:
> > > > https://github.com/coreos/etcd/blob/master/Documentation/api.md
> > > >
> > > > However, the watch command can do more than this. Using the index,
> we can
> > > > watch for commands that have happened in the past. This is useful for
> > > > ensuring you don't miss events between watch commands. Typically, we
> > > watch
> > > > again from the (modifiedIndex + 1) of the node we got.
> > > >
> > > > Let's try to watch for the set command of index 7 again:
> > > >
> > > > curl 'http://127.0.0.1:2379/v2/keys/foo?wait=true&waitIndex=7'
> > > >
> > > >
> > > > On Sat, Apr 11, 2015 at 3:24 PM, Camille Fournier <
> [email protected]>
> > > > wrote:
> > > >
> > > > > For some reason I thought there were commands that took raft
> term/index
> > > > > parameters and would show details over that but now I'm not seeing
> it
> > > so
> > > > > perhaps I am just losing my mind. Too many consensus systems.
> > > > >
> > > > > On Sat, Apr 11, 2015 at 3:14 PM, Flavio Junqueira <
> > > > > [email protected]> wrote:
> > > > >
> > > > >> What raft enhancements?
> > > > >>
> > > > >> -Flavio
> > > > >>
> > > > >> > On 11 Apr 2015, at 20:07, Camille Fournier <[email protected]>
> > > wrote:
> > > > >> >
> > > > >> > It does look like embedding the existing REST contrib into the
> > > servers
> > > > >> > themselves would answer the use case I'm seeing. Then it might
> just
> > > be a
> > > > >> > matter of making it clear that this option is available ;)
> > > > >> >
> > > > >> > We may also look to see what the API exposed by etcd/consul is.
> > > Perhaps
> > > > >> a
> > > > >> > generic API standard for accessing systems like this via HTTP
> would
> > > be
> > > > >> > nice, for cross-compatibility. There are going to be some
> > > differences
> > > > >> due
> > > > >> > to the raft enhancements but the basics could be designed to be
> > > > >> > cross-compatible.
> > > > >> >
> > > > >> > C
> > > > >> >
> > > > >> > On Sat, Apr 11, 2015 at 2:32 PM, Patrick Hunt <[email protected]
> >
> > > wrote:
> > > > >> >
> > > > >> >> We pack quite a bit of functionality in our client side
> libraries.
> > > > >> That's
> > > > >> >> probably one of the main things you'll noticed if you try to
> use
> > > REST.
> > > > >> But
> > > > >> >> then if your use case is primarily configuration, or something
> that
> > > > >> doesn't
> > > > >> >> require sessions then the current REST support should be more
> than
> > > > >> >> sufficient. What are the other systems doing that we are
> currently
> > > not
> > > > >> >> doing wrt API support?
> > > > >> >>
> > > > >> >> I'd think that taking our current rest contrib, updating the
> > > > >> dependencies,
> > > > >> >> and deploying as part of the embedded jetty would work well
> with
> > > > >> minimal
> > > > >> >> effort.
> > > > >> >>
> > > > >> >> Patrick
> > > > >> >>
> > > > >> >> On Sat, Apr 11, 2015 at 9:22 AM, Camille Fournier <
> > > [email protected]>
> > > > >> >> wrote:
> > > > >> >>
> > > > >> >>> I agree that they have to write clients that do work, but
> there is
> > > > >> >> clearly
> > > > >> >>> a desire and willingness out there to do that work in exchange
> > > for a
> > > > >> more
> > > > >> >>> "obvious" way of interacting. So supporting it should be
> something
> > > > >> that
> > > > >> >> we
> > > > >> >>> consider if the users of such systems want the option.
> > > > >> >>>
> > > > >> >>> On Sat, Apr 11, 2015 at 12:07 PM, Jordan Zimmerman <
> > > > >> >>> [email protected]> wrote:
> > > > >> >>>
> > > > >> >>>> REST has issues for end users. They will still have to write
> > > clients
> > > > >> >> and
> > > > >> >>>> do a lot of extra work. REST support is good in most
> languages
> > > but
> > > > >> >> having
> > > > >> >>>> native support is superior. That’s why I choose Thrift for
> the
> > > > >> >> CuratorRPC
> > > > >> >>>> Proxy.
> > > > >> >>>>
> > > > >> >>>> -Jordan
> > > > >> >>>>
> > > > >> >>>>
> > > > >> >>>> On April 10, 2015 at 9:32:47 PM, Michi Mutsuzaki (
> > > > >> >> [email protected])
> > > > >> >>>> wrote:
> > > > >> >>>>
> > > > >> >>>> Yeah it's quite painful to manage another set of processes
> just
> > > for
> > > > >> >>>> proxying requests. I'd definitely use this if I can run it
> > > embedded
> > > > >> in
> > > > >> >>>> the ZooKeeper process. I'm very excited about the idea of
> being
> > > able
> > > > >> >>>> to use curl to look at what's in ZooKeeper :)
> > > > >> >>>>
> > > > >> >>>> On Fri, Apr 10, 2015 at 6:03 PM, Patrick Hunt <
> [email protected]>
> > > > >> >> wrote:
> > > > >> >>>>> Here's the spec and readme from contrib/rest:
> > > > >> >>>>>
> > > > >> >>>>>
> > > > >> >>>>
> > > > >> >>>
> > > > >> >>
> > > > >>
> > >
> http://svn.apache.org/viewvc/zookeeper/trunk/src/contrib/rest/SPEC.txt?view=markup
> > > > >> >>>>>
> > > > >> >>>>
> > > > >> >>>
> > > > >> >>
> > > > >>
> > >
> http://svn.apache.org/viewvc/zookeeper/trunk/src/contrib/rest/README.txt?view=markup
> > > > >> >>>>>
> > > > >> >>>>> The current implementation is a standalone proxy. It's not
> > > embedded
> > > > >> >> in
> > > > >> >>> zk
> > > > >> >>>>> itself. That might be part of the reason.
> > > > >> >>>>>
> > > > >> >>>>> Patrick
> > > > >> >>>>>
> > > > >> >>>>> On Fri, Apr 10, 2015 at 4:43 PM, Camille Fournier <
> > > > >> >> [email protected]>
> > > > >> >>>>> wrote:
> > > > >> >>>>>
> > > > >> >>>>>> Forgive me for not reading the code, can you share in more
> > > detail
> > > > >> >> what
> > > > >> >>>> the
> > > > >> >>>>>> existing REST proxy provides? I'm curious why people are
> > > jumping to
> > > > >> >>> use
> > > > >> >>>>>> etcd because of ease of use w/http access if we already
> have
> > > > >> >> something
> > > > >> >>>> that
> > > > >> >>>>>> works?
> > > > >> >>>>>>
> > > > >> >>>>>> On Fri, Apr 10, 2015 at 7:28 PM, Patrick Hunt <
> > > [email protected]>
> > > > >> >>> wrote:
> > > > >> >>>>>>
> > > > >> >>>>>>> There's the REST work in contrib. Both Andrei and I
> worked on
> > > that
> > > > >> >>> - I
> > > > >> >>>>>> did
> > > > >> >>>>>>> the basic support and Andrei added sessions and
> heartbeating
> > > among
> > > > >> >>>> other
> > > > >> >>>>>>> improvements.
> > > > >> >>>>>>>
> > > > >> >>>>>>> Now that 3.5 has embedded Jetty it should be much simpler
> to
> > > run
> > > > >> >>> REST
> > > > >> >>>> as
> > > > >> >>>>>>> part of the ZK service itself. When the original proxy
> work
> > > was
> > > > >> >> done
> > > > >> >>>>>> Jetty
> > > > >> >>>>>>> was not yet part of ZK.
> > > > >> >>>>>>>
> > > > >> >>>>>>> Patrick
> > > > >> >>>>>>>
> > > > >> >>>>>>> On Thu, Apr 9, 2015 at 10:20 AM, Jordan Zimmerman <
> > > > >> >>>>>>> [email protected]> wrote:
> > > > >> >>>>>>>
> > > > >> >>>>>>>> Since Curator is now Apache and I'm no longer at
> Netflix, I
> > > don't
> > > > >> >>>> follow
> > > > >> >>>>>>>> Netflix messages very much. Sorry about that.
> > > > >> >>>>>>>>
> > > > >> >>>>>>>> -Jordan
> > > > >> >>>>>>>>
> > > > >> >>>>>>>>
> > > > >> >>>>>>>>
> > > > >> >>>>>>>> On April 9, 2015 at 12:15:02 PM, Camille Fournier (
> > > > >> >>>> [email protected])
> > > > >> >>>>>>>> wrote:
> > > > >> >>>>>>>>
> > > > >> >>>>>>>> Thanks Jordan! I actually asked on Twitter whether
> Netflix
> > > had
> > > > >> >>>> anything
> > > > >> >>>>>>>> but
> > > > >> >>>>>>>> didn't get a clear answer.
> > > > >> >>>>>>>>
> > > > >> >>>>>>>> On Thu, Apr 9, 2015 at 11:22 AM, Jordan Zimmerman <
> > > > >> >>>>>>>> [email protected]> wrote:
> > > > >> >>>>>>>>
> > > > >> >>>>>>>>> FYI
> > > > >> >>>>>>>>>
> > > > >> >>>>>>>>> Curator now has a Thrift-based proxy that has all the ZK
> > > APIs
> > > > >> >>>> exposed
> > > > >> >>>>>> as
> > > > >> >>>>>>>>> well as Curator's added APIs and recipes:
> > > > >> >>>>>>>>>
> > > > >> >>>>>>>>> http://curator.apache.org/curator-x-rpc/index.html
> > > > >> >>>>>>>>>
> > > > >> >>>>>>>>> -Jordan
> > > > >> >>>>>>>>>
> > > > >> >>>>>>>>> On April 8, 2015 at 2:09:33 PM, Camille Fournier (
> > > > >> >>>> [email protected])
> > > > >> >>>>>>>>> wrote:
> > > > >> >>>>>>>>>
> > > > >> >>>>>>>>> All,
> > > > >> >>>>>>>>>
> > > > >> >>>>>>>>> I've been doing a bit of research on etcd as part of
> work
> > > for
> > > > >> >> an
> > > > >> >>>>>>>> upcoming
> > > > >> >>>>>>>>> talk, and it has gotten me thinking about what it would
> > > take to
> > > > >> >>>> create
> > > > >> >>>>>>>> an
> > > > >> >>>>>>>>> http version of ZK for certain operations. For many
> > > operations
> > > > >> >>> you
> > > > >> >>>>>> could
> > > > >> >>>>>>>>> put an http proxy in front of ZK to translate, even
> > > > >> >> implementing
> > > > >> >>>> the
> > > > >> >>>>>>>>> "long-poll-style" watch operation to some extent. But it
> > > would
> > > > >> >> be
> > > > >> >>>> very
> > > > >> >>>>>>>>> hard
> > > > >> >>>>>>>>> to do a temporary node via a proxy without a lot of
> proxy
> > > > >> >>> failover
> > > > >> >>>>>>>>> complexity.
> > > > >> >>>>>>>>>
> > > > >> >>>>>>>>> As a bit of background, if you want to do an "ephemeral"
> > > node
> > > > >> >> in
> > > > >> >>>> etcd,
> > > > >> >>>>>>>> you
> > > > >> >>>>>>>>> basically create a key with a TTL. Unless the key is
> updated
> > > > >> >>> with a
> > > > >> >>>>>> new
> > > > >> >>>>>>>>> TTL, the key will auto-expire when the TTL is reached.
> Now,
> > > I
> > > > >> >>> have
> > > > >> >>>> a
> > > > >> >>>>>> lot
> > > > >> >>>>>>>>> of
> > > > >> >>>>>>>>> thoughts about this (seems like you have to implement
> > > > >> >> heartbeats
> > > > >> >>>> via
> > > > >> >>>>>>>> http
> > > > >> >>>>>>>>> to truly mimic ephemeral nodes which may not be as
> simple as
> > > > >> >> all
> > > > >> >>>> this
> > > > >> >>>>>>>> http
> > > > >> >>>>>>>>> sounds), but I do think that if there is appetite for
> easy
> > > http
> > > > >> >>>> access
> > > > >> >>>>>>>> for
> > > > >> >>>>>>>>> consensus systems we should at least take the time to
> think
> > > > >> >> about
> > > > >> >>>> what
> > > > >> >>>>>>>> it
> > > > >> >>>>>>>>> would take for us to provide this. In particular, I
> think
> > > we'd
> > > > >> >>>> have to
> > > > >> >>>>>>>>> make
> > > > >> >>>>>>>>> it possible to create a node with a TTL that is not
> tied to
> > > a
> > > > >> >>>>>> particular
> > > > >> >>>>>>>>> session.
> > > > >> >>>>>>>>>
> > > > >> >>>>>>>>> Curious to see if anyone has any thoughts on this. It
> seems
> > > > >> >> like
> > > > >> >>> a
> > > > >> >>>> bit
> > > > >> >>>>>>>> of
> > > > >> >>>>>>>>> a
> > > > >> >>>>>>>>> shame that ZK, which is a good battle-tested system, is
> > > > >> >>> frequently
> > > > >> >>>>>> being
> > > > >> >>>>>>>>> passed-over these days because of the complexity of
> clients,
> > > > >> >> and
> > > > >> >>>> the
> > > > >> >>>>>>>> fact
> > > > >> >>>>>>>>> that it is really pretty damn hard to do a client impl
> in
> > > > >> >> certain
> > > > >> >>>>>>>>> languages
> > > > >> >>>>>>>>> (Ruby is the notable one I've heard).
> > > > >> >>>>>>>>>
> > > > >> >>>>>>>>> Best,
> > > > >> >>>>>>>>> C
> > > > >> >>>>>>>>>
> > > > >> >>>>>>>>>
> > > > >> >>>>>>>>
> > > > >> >>>>>>>
> > > > >> >>>>>>>
> > > > >> >>>>>>
> > > > >> >>>>
> > > > >> >>>
> > > > >> >>
> > > > >>
> > > > >>
> > > > >
> > >
> > >
>