I have just been trying out Hazelcast on another project and have been very impressed at the simplicity. Very nice.
On Sat, Apr 27, 2013 at 8:38 PM, kishore g <[email protected]> wrote: > Hi Jacques, > > Just added a recipe for service discovery using Helix. More details here > http://helix.incubator.apache.org/recipes/service_discovery.html and see > the sample code here > > https://github.com/apache/incubator-helix/tree/master/recipes/service-discovery/src/main/java/org/apache/helix/servicediscovery > > Note there is no need to run a separate Helix controller. I have listed > some benefits on the recipe page. There are some more features that drill > can benefit in terms of operation, for example you can execute commands on > each drill bit node and add custom message handlers. Helix comes with the > messaging service using which you can command the nodes to perform adhoc > tasks. There is also rest admin interface that provide cluster state and > also perform admin operations. > > Thanks, > Kishore G > > > > > On Tue, Apr 23, 2013 at 2:34 PM, Jacques Nadeau <[email protected]> > wrote: > > > The concept of role determination by ZK is interesting but I'm not > > sure that level of complexity is needed when nodes are fairly static > > in their roles. > > > > Thanks for the information. I need to think more about this. > > > > J > > > > > > > > On Mon, Apr 22, 2013 at 11:09 AM, kishore g <[email protected]> wrote: > > > Hi Jacques, > > > > > > Thanks for the pointer, had a quick look and it is indeed very simple. > > You > > > just have the need for service discovery. The slide pointed that there > > was > > > some need for partition and resource management but looks like the > actual > > > requirement is quite different. > > > > > > While this can still be done through Helix, I dont see much value in > > using > > > it if requirement stays the same. > > > > > > Few things to ensure > > > 1) You are not setting any watchers but instead reading all zookeeper > > > znodes every X seconds, this is good to avoid herd effect during start > up > > > of nodes but might need some tuning when you have large number of > nodes. > > I > > > dint check the curator library if its using zk async library (you might > > > want to make sure it uses that). > > > 2) Not clear how you plan to handle error scenarios, what if the node > > fails > > > to start up or is flapping, how will you know that a node is not part > of > > > the cluster. Do you plan to have list of nodes else where and compare > the > > > two. > > > 3) How do you plan to blacklist a node that is behaving badly, do you > > > envision providing an admin api later that will allow one to > > disable/enable > > > such nodes. > > > 4) Do you envision each node having multiple service names, for example > > if > > > you a using sparrow i am assuming few nodes will be schedulers and > others > > > workers, is it possible for a node to be both scheduler and worker. If > > yes, > > > how will a node know if it has to be scheduler/worker/both. > > > > > > The reason i bring up these points, the way it is designed right now, > the > > > nodes own the configuration(host,port, service types etc) and when they > > > start up they simply put that information in zk and make it available > for > > > others to discover. Helix advocates a different methodology, the node > > > simply starts up and does not know what it has to do, all > > > actions/configuration come from outside via transitions. Which means > all > > > nodes start up with exactly the same configuration just an id and > > zookeeper > > > address. So it really depends on how much configuration you have and if > > you > > > want that to be dynamically changed or you are ok with pushing the > config > > > to each node and restarting it. It kind of falls in operability space > and > > > its probably too early to have a clear picture about that but it makes > > > quite a difference over the long run. > > > > > > Hope this helps and thanks again for your time. > > > > > > Thanks, > > > Kishore G > > > > > > > > > > > > On Sun, Apr 21, 2013 at 8:10 PM, Jacques Nadeau <[email protected]> > > wrote: > > > > > >> Hey Kishore, > > >> > > >> I'm really excited about Helix. It is great to see the toolbox > > >> starting to be filled with such powerful tools. Some random thoughts > > >> with regards to Helix/Curator/etc. > > >> > > >> It seems like we're trying to avoid even supporting a number of things > > >> that the Helix framework provides. We really want to avoid a master > > >> node. We hope to avoid the concept of particular nodes holding > > >> specific resources. (As a query engine, we don't currently have the > > >> concept of things like regions.) We're trying to build upon Berkeley's > > >> Sparrow work and avoid the concept of centralized scheduling. The > > >> driving node for a particular query is the only entity responsible for > > >> pushing a query to completion and has direct RPC interaction with its > > >> 'children'. > > >> > > >> Our current use of zookeeper is strictly for the purpose of service > > >> registration and membership information. If you want to see the (lack > > >> of) complexity of our use right now, you can look here: > > >> > > >> > > > https://github.com/apache/incubator-drill/tree/execwork/sandbox/prototype/exec/java-exec/src/main/java/org/apache/drill/exec/coord > > >> > > >> Thoughts? > > >> > > >> Jacques > > >> > > >> On Sun, Apr 21, 2013 at 2:05 PM, kishore g <[email protected]> > wrote: > > >> > Thanks Ted for making a case. I am pretty sure there were valid > > points. > > >> > > > >> > I did not get the zero-conf option, is the case that Helix needs to > be > > >> run > > >> > as a separate service. Helix can be used in both modes as a service > > and > > >> > also a library. We have deployed it in both modes and we have seen > the > > >> need > > >> > for it within LinkedIn. > > >> > > > >> > It would be really great if I can get the actual requirements and do > > >> > another pass evaluating. > > >> > > > >> > Thanks and appreciate your time in answering my questions. > > >> > > > >> > Thanks, > > >> > Kishore G > > >> > > > >> > > > >> > On Sun, Apr 21, 2013 at 10:35 AM, Ted Dunning < > [email protected]> > > >> wrote: > > >> > > > >> >> Kishore, > > >> >> > > >> >> I made the case for Helix and the group seems to have strongly > > >> gravitated > > >> >> to the lower level that Curator provides. > > >> >> > > >> >> One feature that would have improved the case for Helix would have > > been > > >> >> viable zero-conf operation as an option. > > >> >> > > >> >> The game isn't over, however, and if you would like to get involved > > >> here on > > >> >> Drill, it might help to have another point of view. > > >> >> > > >> >> > > >> >> > > >> >> > > >> >> On Sun, Apr 21, 2013 at 9:08 AM, kishore g <[email protected]> > > wrote: > > >> >> > > >> >> > Hi Michael, > > >> >> > > > >> >> > Thanks for the update. Here are my thoughts, though cant resist > > >> telling > > >> >> > good things about Helix since I am the author :-). > > >> >> > > > >> >> > Here is how I see zk v/s curator v/s helix. > > >> >> > > > >> >> > Zk is amazing for co-ordination and maintaining cluster data like > > >> >> > configuration, etc. It provides the concept of ephemeral which > can > > be > > >> >> used > > >> >> > for liveness detection of a process. However there are lot of > > corner > > >> >> cases > > >> >> > that is non trivial to code. Curator is a library that makes it > > easy > > >> to > > >> >> use > > >> >> > those apis, it provides the recipes in terms of leader election, > > >> barrier, > > >> >> > etc. Helix provides a much higher abstraction where it treats > > various > > >> >> > components of a distributed system as first class citizens and > > allows > > >> >> > system builders to think in terms of nodes, resources, > partitions, > > >> state > > >> >> > machine etc. Helix underneath uses zkclient(something like > > curator) to > > >> >> make > > >> >> > it easy to interact with zookeeper. We had plans to use curator > but > > >> Helix > > >> >> > needed really good performance in terms of start up/fail over > time > > and > > >> >> when > > >> >> > we have 1000's of partitions. We had to use low level apis of zk > to > > >> >> achieve > > >> >> > that. > > >> >> > > > >> >> > From my experience, while building distributed systems cluster > > >> management > > >> >> > starts out very simple and one will be able to do a prototype > very > > >> >> quickly. > > >> >> > But over time, things get complicated and need many more > features. > > At > > >> >> > LinkedIn we started in a similar way where we simply used some > > >> ephemeral > > >> >> > nodes to know whether we have a lock or not. But over time, lot > of > > >> things > > >> >> > like controlling the assignment from outside, evenly distributing > > >> locks, > > >> >> > hand over of locks gracefully, restricting which nodes can own a > > >> >> partition, > > >> >> > cluster expansion, throttling of any cluster wide operations etc > > got > > >> >> > complicated and we ended up having to implement one solution for > > each > > >> >> > feature. For every feature, we took lot of time to flush out > issues > > >> with > > >> >> zk > > >> >> > interaction and we had huge scaling issues when we tried with > > 1000's > > >> of > > >> >> > partitions and lot of ephemerals, it was a night mare to debug. > > Over > > >> >> time, > > >> >> > most systems come up with a state machine for example you can see > > >> hbase > > >> >> > master, yarn ( job tracker, task tracker). Its kind of obvious > that > > >> >> having > > >> >> > a state machine is the right way to build a large distributed > > system > > >> and > > >> >> > allows you to have right level of abstraction and is a much > cleaner > > >> >> design. > > >> >> > What Helix did was to generalize this concept and allows one to > > >> configure > > >> >> > the state machine. > > >> >> > > > >> >> > All other features were basically built on top of states and > > >> transitions. > > >> >> > For example, we had some tasks that needs to be distributed among > > the > > >> >> > nodes. when a node dies it should be taken up by another node, > > this is > > >> >> > simple using a ephemeral nodes. But lets say you want to limit > the > > max > > >> >> > tasks a node can handle, with Helix is modelled as a constraint > and > > >> you > > >> >> can > > >> >> > specify how many tasks can run on a node, process etc that is > > >> completely > > >> >> > controlled from outside without having to change the application > > >> >> > code. Similarly when the dead node comes back other nodes have to > > >> >> > gracefully hand over their tasks. Its not trivial to achieve > this. > > >> >> > > > >> >> > There are lot of other things we have encountered while building > > >> >> > distributed systems and we have always been able to add them to > > Helix > > >> >> such > > >> >> > that other systems can benefit from it. For example, I recently > > >> presented > > >> >> > how to test and debug large scale distributed systems. It > basically > > >> comes > > >> >> > with tools which parses zk transaction logs and provides the > exact > > >> >> sequence > > >> >> > of steps that lead to a failure. More details here > > >> >> > > http://www.slideshare.net/KishoreGopalakrishna/data-driven-testing > > >> >> > > > >> >> > To summarize, > > >> >> > > > >> >> > So its not really zk v/s curator v/s helix. Its basically the > > level of > > >> >> > abstraction one wants. One can build Helix using curator which > > uses zk > > >> >> > underneath. So it basically boils down to what is the system you > > are > > >> >> > building and how complex can it get. > > >> >> > > > >> >> > There are definitely some use cases where Helix is not needed and > > is > > >> >> > probably over kill but Apache Drill looks like a project that > will > > get > > >> >> > pretty big and I am sure you will see all the requirements we saw > > over > > >> >> > time. > > >> >> > > > >> >> > Hope this helps. As I mentioned earlier, i will be happy to > provide > > >> more > > >> >> > details and contribute. > > >> >> > > > >> >> > thanks, > > >> >> > Kishore G > > >> >> > > > >> >> > > > >> >> > > > >> >> > > > >> >> > > > >> >> > > > >> >> > > > >> >> > > > >> >> > > > >> >> > > > >> >> > > > >> >> > > > >> >> > > > >> >> > > > >> >> > > > >> >> > > > >> >> > > > >> >> > On Sun, Apr 21, 2013 at 1:57 AM, Michael Hausenblas < > > >> >> > [email protected]> wrote: > > >> >> > > > >> >> > > > > >> >> > > At the time I put the slides together, Helix was indeed > > considered. > > >> >> > > AFAICT, currently we seem to have settled on Netflix Curator > [1], > > >> >> > however. > > >> >> > > I wouldn't exclude the possibility that we may utilise Helix in > > >> future; > > >> >> > > personally, I think it's a great thing. Would be very > > interested in > > >> >> your > > >> >> > > experiences with it (also, re Zk vs. Curator vs. Helix). > > >> >> > > > > >> >> > > Cheers, > > >> >> > > Michael > > >> >> > > > > >> >> > > [1] https://github.com/Netflix/curator/wiki > > >> >> > > > > >> >> > > -- > > >> >> > > Michael Hausenblas > > >> >> > > Ireland, Europe > > >> >> > > http://mhausenblas.info/ > > >> >> > > > > >> >> > > On 21 Apr 2013, at 08:39, kishore g <[email protected]> > wrote: > > >> >> > > > > >> >> > > > Hello, > > >> >> > > > > > >> >> > > > I was reading the slide deck from Hadoop summit > > >> >> > > > > > >> >> > > > > >> >> > > > >> >> > > >> > > > http://www.slideshare.net/Hadoop_Summit/understanding-the-value-and-architecture-of-apache-drill > > >> >> > > > > > >> >> > > > On slide 27, there is mention of using Helix for partition > and > > >> >> resource > > >> >> > > > management. I could not find much details on > > >> >> > > > https://issues.apache.org/jira/browse/DRILL-53 > > >> >> > > > > > >> >> > > > Can some one provide more details on this, we might be able > to > > >> >> > > contribute. > > >> >> > > > > > >> >> > > > thanks, > > >> >> > > > Kishore G > > >> >> > > > > >> >> > > > > >> >> > > > >> >> > > >> > > >
