Thanks Ted for making a case. I am pretty sure there were valid points.

I did not get the zero-conf option, is the case that Helix needs to be run
as a separate service. Helix can be used in both modes as a service and
also a library. We have deployed it in both modes and we have seen the need
for it within LinkedIn.

It would be really great if I can get the actual requirements and do
another pass evaluating.

Thanks and appreciate your time in answering my questions.

Thanks,
Kishore G


On Sun, Apr 21, 2013 at 10:35 AM, Ted Dunning <[email protected]> wrote:

> Kishore,
>
> I made the case for Helix and the group seems to have strongly gravitated
> to the lower level that Curator provides.
>
> One feature that would have improved the case for Helix would have been
> viable zero-conf operation as an option.
>
> The game isn't over, however, and if you would like to get involved here on
> Drill, it might help to have another point of view.
>
>
>
>
> On Sun, Apr 21, 2013 at 9:08 AM, kishore g <[email protected]> wrote:
>
> > Hi Michael,
> >
> > Thanks for the update. Here are my thoughts, though cant resist telling
> > good things about Helix since I am the author :-).
> >
> > Here is how I see zk v/s curator v/s helix.
> >
> > Zk is amazing for co-ordination and maintaining cluster data like
> > configuration, etc. It provides the concept of ephemeral which can be
> used
> > for liveness detection of a process. However there are lot of corner
> cases
> > that is non trivial to code. Curator is a library that makes it easy to
> use
> > those apis, it provides the recipes in terms of leader election, barrier,
> > etc. Helix provides a much higher abstraction where it treats various
> > components of a distributed system as first class citizens and allows
> > system builders to think in terms of nodes, resources, partitions, state
> > machine etc. Helix underneath uses zkclient(something like curator) to
> make
> > it easy to interact with zookeeper. We had plans to use curator but Helix
> > needed really good performance in terms of start up/fail over time and
> when
> > we have 1000's of partitions. We had to use low level apis of zk to
> achieve
> > that.
> >
> > From my experience, while building distributed systems cluster management
> > starts out very simple and one will be able to do a prototype very
> quickly.
> > But over time, things get complicated and need many more features. At
> > LinkedIn we started in a similar way where we simply used some ephemeral
> > nodes to know whether we have a lock or not. But over time, lot of things
> > like controlling the assignment from outside, evenly distributing locks,
> > hand over of locks gracefully, restricting which nodes can own a
> partition,
> > cluster expansion, throttling of any cluster wide operations etc got
> > complicated and we ended up having to implement one solution for each
> > feature. For every feature, we took lot of time to flush out issues with
> zk
> > interaction and we had huge scaling issues when we tried with 1000's of
> > partitions and lot of ephemerals, it was a night mare to debug. Over
> time,
> > most systems come up with a state machine for example you can see hbase
> > master, yarn ( job tracker, task tracker). Its kind of obvious that
> having
> > a state machine is the right way to build a large distributed system and
> > allows you to have right level of abstraction and is a much cleaner
> design.
> > What Helix did was to generalize this concept and allows one to configure
> > the state machine.
> >
> > All other features were basically built on top of states and transitions.
> > For example, we had some tasks that needs to be distributed among the
> > nodes. when a node dies it should be taken up by another node, this is
> > simple using a ephemeral nodes. But lets say you want to limit the max
> > tasks a node can handle, with Helix is modelled as a constraint and you
> can
> > specify how many tasks can run on a node, process etc that is completely
> > controlled from outside without having to change the application
> > code. Similarly when the dead node comes back other nodes have to
> > gracefully hand over their tasks. Its not trivial to achieve this.
> >
> > There are lot of other things we have encountered while building
> > distributed systems and we have always been able to add them to Helix
> such
> > that other systems can benefit from it. For example, I recently presented
> > how to test and debug large scale distributed systems. It basically comes
> > with tools which parses zk transaction logs and provides the exact
> sequence
> > of steps that lead to a failure. More details here
> > http://www.slideshare.net/KishoreGopalakrishna/data-driven-testing
> >
> > To summarize,
> >
> > So its not really zk v/s curator v/s helix. Its basically the level of
> > abstraction one wants. One can build Helix using curator which uses zk
> > underneath. So it basically boils down to what is the system you are
> > building and how complex can it get.
> >
> > There are definitely some use cases where Helix is not needed and is
> > probably over kill but Apache Drill looks like a project that will get
> > pretty big and I am sure you will see all the requirements we saw over
> > time.
> >
> > Hope this helps. As I mentioned earlier, i will be happy to provide more
> > details and contribute.
> >
> > thanks,
> > Kishore G
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > On Sun, Apr 21, 2013 at 1:57 AM, Michael Hausenblas <
> > [email protected]> wrote:
> >
> > >
> > > At the time I put the slides together, Helix was indeed considered.
> > > AFAICT, currently we seem to have settled on Netflix Curator [1],
> > however.
> > > I wouldn't exclude the possibility that we may utilise Helix in future;
> > > personally, I think it's a great thing.  Would be very interested in
> your
> > > experiences with it (also, re Zk vs. Curator vs. Helix).
> > >
> > > Cheers,
> > >                 Michael
> > >
> > > [1] https://github.com/Netflix/curator/wiki
> > >
> > > --
> > > Michael Hausenblas
> > > Ireland, Europe
> > > http://mhausenblas.info/
> > >
> > > On 21 Apr 2013, at 08:39, kishore g <[email protected]> wrote:
> > >
> > > > Hello,
> > > >
> > > > I was reading the slide deck from Hadoop summit
> > > >
> > >
> >
> http://www.slideshare.net/Hadoop_Summit/understanding-the-value-and-architecture-of-apache-drill
> > > >
> > > > On slide 27, there is mention of using Helix for partition and
> resource
> > > > management. I  could not find much details on
> > > > https://issues.apache.org/jira/browse/DRILL-53
> > > >
> > > > Can some one provide more details on this, we might be able to
> > > contribute.
> > > >
> > > > thanks,
> > > > Kishore G
> > >
> > >
> >
>

Reply via email to