Thanks Ted for making a case. I am pretty sure there were valid points. I did not get the zero-conf option, is the case that Helix needs to be run as a separate service. Helix can be used in both modes as a service and also a library. We have deployed it in both modes and we have seen the need for it within LinkedIn.
It would be really great if I can get the actual requirements and do another pass evaluating. Thanks and appreciate your time in answering my questions. Thanks, Kishore G On Sun, Apr 21, 2013 at 10:35 AM, Ted Dunning <[email protected]> wrote: > Kishore, > > I made the case for Helix and the group seems to have strongly gravitated > to the lower level that Curator provides. > > One feature that would have improved the case for Helix would have been > viable zero-conf operation as an option. > > The game isn't over, however, and if you would like to get involved here on > Drill, it might help to have another point of view. > > > > > On Sun, Apr 21, 2013 at 9:08 AM, kishore g <[email protected]> wrote: > > > Hi Michael, > > > > Thanks for the update. Here are my thoughts, though cant resist telling > > good things about Helix since I am the author :-). > > > > Here is how I see zk v/s curator v/s helix. > > > > Zk is amazing for co-ordination and maintaining cluster data like > > configuration, etc. It provides the concept of ephemeral which can be > used > > for liveness detection of a process. However there are lot of corner > cases > > that is non trivial to code. Curator is a library that makes it easy to > use > > those apis, it provides the recipes in terms of leader election, barrier, > > etc. Helix provides a much higher abstraction where it treats various > > components of a distributed system as first class citizens and allows > > system builders to think in terms of nodes, resources, partitions, state > > machine etc. Helix underneath uses zkclient(something like curator) to > make > > it easy to interact with zookeeper. We had plans to use curator but Helix > > needed really good performance in terms of start up/fail over time and > when > > we have 1000's of partitions. We had to use low level apis of zk to > achieve > > that. > > > > From my experience, while building distributed systems cluster management > > starts out very simple and one will be able to do a prototype very > quickly. > > But over time, things get complicated and need many more features. At > > LinkedIn we started in a similar way where we simply used some ephemeral > > nodes to know whether we have a lock or not. But over time, lot of things > > like controlling the assignment from outside, evenly distributing locks, > > hand over of locks gracefully, restricting which nodes can own a > partition, > > cluster expansion, throttling of any cluster wide operations etc got > > complicated and we ended up having to implement one solution for each > > feature. For every feature, we took lot of time to flush out issues with > zk > > interaction and we had huge scaling issues when we tried with 1000's of > > partitions and lot of ephemerals, it was a night mare to debug. Over > time, > > most systems come up with a state machine for example you can see hbase > > master, yarn ( job tracker, task tracker). Its kind of obvious that > having > > a state machine is the right way to build a large distributed system and > > allows you to have right level of abstraction and is a much cleaner > design. > > What Helix did was to generalize this concept and allows one to configure > > the state machine. > > > > All other features were basically built on top of states and transitions. > > For example, we had some tasks that needs to be distributed among the > > nodes. when a node dies it should be taken up by another node, this is > > simple using a ephemeral nodes. But lets say you want to limit the max > > tasks a node can handle, with Helix is modelled as a constraint and you > can > > specify how many tasks can run on a node, process etc that is completely > > controlled from outside without having to change the application > > code. Similarly when the dead node comes back other nodes have to > > gracefully hand over their tasks. Its not trivial to achieve this. > > > > There are lot of other things we have encountered while building > > distributed systems and we have always been able to add them to Helix > such > > that other systems can benefit from it. For example, I recently presented > > how to test and debug large scale distributed systems. It basically comes > > with tools which parses zk transaction logs and provides the exact > sequence > > of steps that lead to a failure. More details here > > http://www.slideshare.net/KishoreGopalakrishna/data-driven-testing > > > > To summarize, > > > > So its not really zk v/s curator v/s helix. Its basically the level of > > abstraction one wants. One can build Helix using curator which uses zk > > underneath. So it basically boils down to what is the system you are > > building and how complex can it get. > > > > There are definitely some use cases where Helix is not needed and is > > probably over kill but Apache Drill looks like a project that will get > > pretty big and I am sure you will see all the requirements we saw over > > time. > > > > Hope this helps. As I mentioned earlier, i will be happy to provide more > > details and contribute. > > > > thanks, > > Kishore G > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Sun, Apr 21, 2013 at 1:57 AM, Michael Hausenblas < > > [email protected]> wrote: > > > > > > > > At the time I put the slides together, Helix was indeed considered. > > > AFAICT, currently we seem to have settled on Netflix Curator [1], > > however. > > > I wouldn't exclude the possibility that we may utilise Helix in future; > > > personally, I think it's a great thing. Would be very interested in > your > > > experiences with it (also, re Zk vs. Curator vs. Helix). > > > > > > Cheers, > > > Michael > > > > > > [1] https://github.com/Netflix/curator/wiki > > > > > > -- > > > Michael Hausenblas > > > Ireland, Europe > > > http://mhausenblas.info/ > > > > > > On 21 Apr 2013, at 08:39, kishore g <[email protected]> wrote: > > > > > > > Hello, > > > > > > > > I was reading the slide deck from Hadoop summit > > > > > > > > > > http://www.slideshare.net/Hadoop_Summit/understanding-the-value-and-architecture-of-apache-drill > > > > > > > > On slide 27, there is mention of using Helix for partition and > resource > > > > management. I could not find much details on > > > > https://issues.apache.org/jira/browse/DRILL-53 > > > > > > > > Can some one provide more details on this, we might be able to > > > contribute. > > > > > > > > thanks, > > > > Kishore G > > > > > > > > >
