Hi Michael,

Thanks for the update. Here are my thoughts, though cant resist telling
good things about Helix since I am the author :-).

Here is how I see zk v/s curator v/s helix.

Zk is amazing for co-ordination and maintaining cluster data like
configuration, etc. It provides the concept of ephemeral which can be used
for liveness detection of a process. However there are lot of corner cases
that is non trivial to code. Curator is a library that makes it easy to use
those apis, it provides the recipes in terms of leader election, barrier,
etc. Helix provides a much higher abstraction where it treats various
components of a distributed system as first class citizens and allows
system builders to think in terms of nodes, resources, partitions, state
machine etc. Helix underneath uses zkclient(something like curator) to make
it easy to interact with zookeeper. We had plans to use curator but Helix
needed really good performance in terms of start up/fail over time and when
we have 1000's of partitions. We had to use low level apis of zk to achieve
that.

>From my experience, while building distributed systems cluster management
starts out very simple and one will be able to do a prototype very quickly.
But over time, things get complicated and need many more features. At
LinkedIn we started in a similar way where we simply used some ephemeral
nodes to know whether we have a lock or not. But over time, lot of things
like controlling the assignment from outside, evenly distributing locks,
hand over of locks gracefully, restricting which nodes can own a partition,
cluster expansion, throttling of any cluster wide operations etc got
complicated and we ended up having to implement one solution for each
feature. For every feature, we took lot of time to flush out issues with zk
interaction and we had huge scaling issues when we tried with 1000's of
partitions and lot of ephemerals, it was a night mare to debug. Over time,
most systems come up with a state machine for example you can see hbase
master, yarn ( job tracker, task tracker). Its kind of obvious that having
a state machine is the right way to build a large distributed system and
allows you to have right level of abstraction and is a much cleaner design.
What Helix did was to generalize this concept and allows one to configure
the state machine.

All other features were basically built on top of states and transitions.
For example, we had some tasks that needs to be distributed among the
nodes. when a node dies it should be taken up by another node, this is
simple using a ephemeral nodes. But lets say you want to limit the max
tasks a node can handle, with Helix is modelled as a constraint and you can
specify how many tasks can run on a node, process etc that is completely
controlled from outside without having to change the application
code. Similarly when the dead node comes back other nodes have to
gracefully hand over their tasks. Its not trivial to achieve this.

There are lot of other things we have encountered while building
distributed systems and we have always been able to add them to Helix such
that other systems can benefit from it. For example, I recently presented
how to test and debug large scale distributed systems. It basically comes
with tools which parses zk transaction logs and provides the exact sequence
of steps that lead to a failure. More details here
http://www.slideshare.net/KishoreGopalakrishna/data-driven-testing

To summarize,

So its not really zk v/s curator v/s helix. Its basically the level of
abstraction one wants. One can build Helix using curator which uses zk
underneath. So it basically boils down to what is the system you are
building and how complex can it get.

There are definitely some use cases where Helix is not needed and is
probably over kill but Apache Drill looks like a project that will get
pretty big and I am sure you will see all the requirements we saw over time.

Hope this helps. As I mentioned earlier, i will be happy to provide more
details and contribute.

thanks,
Kishore G

















On Sun, Apr 21, 2013 at 1:57 AM, Michael Hausenblas <
[email protected]> wrote:

>
> At the time I put the slides together, Helix was indeed considered.
> AFAICT, currently we seem to have settled on Netflix Curator [1], however.
> I wouldn't exclude the possibility that we may utilise Helix in future;
> personally, I think it's a great thing.  Would be very interested in your
> experiences with it (also, re Zk vs. Curator vs. Helix).
>
> Cheers,
>                 Michael
>
> [1] https://github.com/Netflix/curator/wiki
>
> --
> Michael Hausenblas
> Ireland, Europe
> http://mhausenblas.info/
>
> On 21 Apr 2013, at 08:39, kishore g <[email protected]> wrote:
>
> > Hello,
> >
> > I was reading the slide deck from Hadoop summit
> >
> http://www.slideshare.net/Hadoop_Summit/understanding-the-value-and-architecture-of-apache-drill
> >
> > On slide 27, there is mention of using Helix for partition and resource
> > management. I  could not find much details on
> > https://issues.apache.org/jira/browse/DRILL-53
> >
> > Can some one provide more details on this, we might be able to
> contribute.
> >
> > thanks,
> > Kishore G
>
>

Reply via email to