I have just been trying out Hazelcast on another project and have been very
impressed at the simplicity.  Very nice.




On Sat, Apr 27, 2013 at 8:38 PM, kishore g <[email protected]> wrote:

> Hi Jacques,
>
> Just added a recipe for service discovery using Helix. More details here
> http://helix.incubator.apache.org/recipes/service_discovery.html and see
> the sample code here
>
> https://github.com/apache/incubator-helix/tree/master/recipes/service-discovery/src/main/java/org/apache/helix/servicediscovery
>
> Note there is no need to run a separate Helix controller. I have listed
> some benefits on the recipe page. There are some more features that drill
> can benefit in terms of operation, for example you can execute commands on
> each drill bit node and add custom message handlers. Helix comes with the
> messaging service using which you can command the nodes to perform adhoc
> tasks. There is also rest admin interface that provide cluster state and
> also perform admin operations.
>
> Thanks,
> Kishore G
>
>
>
>
> On Tue, Apr 23, 2013 at 2:34 PM, Jacques Nadeau <[email protected]>
> wrote:
>
> > The concept of role determination by ZK is interesting but I'm not
> > sure that level of complexity is needed when nodes are fairly static
> > in their roles.
> >
> > Thanks for the information.  I need to think more about this.
> >
> > J
> >
> >
> >
> > On Mon, Apr 22, 2013 at 11:09 AM, kishore g <[email protected]> wrote:
> > > Hi Jacques,
> > >
> > > Thanks for the pointer, had a quick look and it is indeed very simple.
> > You
> > > just have the need for service discovery. The slide pointed that there
> > was
> > > some need for partition and resource management but looks like the
> actual
> > > requirement is quite different.
> > >
> > > While this can still be done through Helix, I dont see much value in
> > using
> > > it if requirement stays the same.
> > >
> > > Few things to ensure
> > > 1) You are not setting any watchers but instead reading all zookeeper
> > > znodes every X seconds, this is good to avoid herd effect during start
> up
> > > of nodes but might need some tuning when you have large number of
> nodes.
> > I
> > > dint check the curator library if its using zk async library (you might
> > > want to make sure it uses that).
> > > 2) Not clear how you plan to handle error scenarios, what if the node
> > fails
> > > to start up or is flapping, how will you know that a node is not part
> of
> > > the cluster. Do you plan to have list of nodes else where and compare
> the
> > > two.
> > > 3) How do you plan to blacklist a node that is behaving badly, do you
> > > envision providing an admin api later that will allow one to
> > disable/enable
> > > such nodes.
> > > 4) Do you envision each node having multiple service names, for example
> > if
> > > you a using sparrow i am assuming few nodes will be schedulers and
> others
> > > workers, is it possible for a node to be both scheduler and worker. If
> > yes,
> > > how will a node know if it has to be scheduler/worker/both.
> > >
> > > The reason i bring up these points, the way it is designed right now,
> the
> > > nodes own the configuration(host,port, service types etc) and when they
> > > start up they simply put that information in zk and make it available
> for
> > > others to discover. Helix advocates a different methodology, the node
> > > simply starts up and does not know what it has to do, all
> > > actions/configuration come from outside via transitions. Which means
> all
> > > nodes start up with exactly the same configuration just an id and
> > zookeeper
> > > address. So it really depends on how much configuration you have and if
> > you
> > > want that to be dynamically changed or you are ok with pushing the
> config
> > > to each node and restarting it. It kind of falls in operability space
> and
> > > its probably too early to have a clear picture about that but it makes
> > > quite a difference over the long run.
> > >
> > > Hope this helps and thanks again for your time.
> > >
> > > Thanks,
> > > Kishore G
> > >
> > >
> > >
> > > On Sun, Apr 21, 2013 at 8:10 PM, Jacques Nadeau <[email protected]>
> > wrote:
> > >
> > >> Hey Kishore,
> > >>
> > >> I'm really excited about Helix.  It is great to see the toolbox
> > >> starting to be filled with such powerful tools.  Some random thoughts
> > >> with regards to Helix/Curator/etc.
> > >>
> > >> It seems like we're trying to avoid even supporting a number of things
> > >> that the Helix framework provides.  We really want to avoid a master
> > >> node.  We hope to avoid the concept of particular nodes holding
> > >> specific resources.  (As a query engine, we don't currently have the
> > >> concept of things like regions.) We're trying to build upon Berkeley's
> > >> Sparrow work and avoid the concept of centralized scheduling.  The
> > >> driving node for a particular query is the only entity responsible for
> > >> pushing a query to completion and has direct RPC interaction with its
> > >> 'children'.
> > >>
> > >> Our current use of zookeeper is strictly for the purpose of service
> > >> registration and membership information.  If you want to see the (lack
> > >> of) complexity of our use right now, you can look here:
> > >>
> > >>
> >
> https://github.com/apache/incubator-drill/tree/execwork/sandbox/prototype/exec/java-exec/src/main/java/org/apache/drill/exec/coord
> > >>
> > >> Thoughts?
> > >>
> > >> Jacques
> > >>
> > >> On Sun, Apr 21, 2013 at 2:05 PM, kishore g <[email protected]>
> wrote:
> > >> > Thanks Ted for making a case. I am pretty sure there were valid
> > points.
> > >> >
> > >> > I did not get the zero-conf option, is the case that Helix needs to
> be
> > >> run
> > >> > as a separate service. Helix can be used in both modes as a service
> > and
> > >> > also a library. We have deployed it in both modes and we have seen
> the
> > >> need
> > >> > for it within LinkedIn.
> > >> >
> > >> > It would be really great if I can get the actual requirements and do
> > >> > another pass evaluating.
> > >> >
> > >> > Thanks and appreciate your time in answering my questions.
> > >> >
> > >> > Thanks,
> > >> > Kishore G
> > >> >
> > >> >
> > >> > On Sun, Apr 21, 2013 at 10:35 AM, Ted Dunning <
> [email protected]>
> > >> wrote:
> > >> >
> > >> >> Kishore,
> > >> >>
> > >> >> I made the case for Helix and the group seems to have strongly
> > >> gravitated
> > >> >> to the lower level that Curator provides.
> > >> >>
> > >> >> One feature that would have improved the case for Helix would have
> > been
> > >> >> viable zero-conf operation as an option.
> > >> >>
> > >> >> The game isn't over, however, and if you would like to get involved
> > >> here on
> > >> >> Drill, it might help to have another point of view.
> > >> >>
> > >> >>
> > >> >>
> > >> >>
> > >> >> On Sun, Apr 21, 2013 at 9:08 AM, kishore g <[email protected]>
> > wrote:
> > >> >>
> > >> >> > Hi Michael,
> > >> >> >
> > >> >> > Thanks for the update. Here are my thoughts, though cant resist
> > >> telling
> > >> >> > good things about Helix since I am the author :-).
> > >> >> >
> > >> >> > Here is how I see zk v/s curator v/s helix.
> > >> >> >
> > >> >> > Zk is amazing for co-ordination and maintaining cluster data like
> > >> >> > configuration, etc. It provides the concept of ephemeral which
> can
> > be
> > >> >> used
> > >> >> > for liveness detection of a process. However there are lot of
> > corner
> > >> >> cases
> > >> >> > that is non trivial to code. Curator is a library that makes it
> > easy
> > >> to
> > >> >> use
> > >> >> > those apis, it provides the recipes in terms of leader election,
> > >> barrier,
> > >> >> > etc. Helix provides a much higher abstraction where it treats
> > various
> > >> >> > components of a distributed system as first class citizens and
> > allows
> > >> >> > system builders to think in terms of nodes, resources,
> partitions,
> > >> state
> > >> >> > machine etc. Helix underneath uses zkclient(something like
> > curator) to
> > >> >> make
> > >> >> > it easy to interact with zookeeper. We had plans to use curator
> but
> > >> Helix
> > >> >> > needed really good performance in terms of start up/fail over
> time
> > and
> > >> >> when
> > >> >> > we have 1000's of partitions. We had to use low level apis of zk
> to
> > >> >> achieve
> > >> >> > that.
> > >> >> >
> > >> >> > From my experience, while building distributed systems cluster
> > >> management
> > >> >> > starts out very simple and one will be able to do a prototype
> very
> > >> >> quickly.
> > >> >> > But over time, things get complicated and need many more
> features.
> > At
> > >> >> > LinkedIn we started in a similar way where we simply used some
> > >> ephemeral
> > >> >> > nodes to know whether we have a lock or not. But over time, lot
> of
> > >> things
> > >> >> > like controlling the assignment from outside, evenly distributing
> > >> locks,
> > >> >> > hand over of locks gracefully, restricting which nodes can own a
> > >> >> partition,
> > >> >> > cluster expansion, throttling of any cluster wide operations etc
> > got
> > >> >> > complicated and we ended up having to implement one solution for
> > each
> > >> >> > feature. For every feature, we took lot of time to flush out
> issues
> > >> with
> > >> >> zk
> > >> >> > interaction and we had huge scaling issues when we tried with
> > 1000's
> > >> of
> > >> >> > partitions and lot of ephemerals, it was a night mare to debug.
> > Over
> > >> >> time,
> > >> >> > most systems come up with a state machine for example you can see
> > >> hbase
> > >> >> > master, yarn ( job tracker, task tracker). Its kind of obvious
> that
> > >> >> having
> > >> >> > a state machine is the right way to build a large distributed
> > system
> > >> and
> > >> >> > allows you to have right level of abstraction and is a much
> cleaner
> > >> >> design.
> > >> >> > What Helix did was to generalize this concept and allows one to
> > >> configure
> > >> >> > the state machine.
> > >> >> >
> > >> >> > All other features were basically built on top of states and
> > >> transitions.
> > >> >> > For example, we had some tasks that needs to be distributed among
> > the
> > >> >> > nodes. when a node dies it should be taken up by another node,
> > this is
> > >> >> > simple using a ephemeral nodes. But lets say you want to limit
> the
> > max
> > >> >> > tasks a node can handle, with Helix is modelled as a constraint
> and
> > >> you
> > >> >> can
> > >> >> > specify how many tasks can run on a node, process etc that is
> > >> completely
> > >> >> > controlled from outside without having to change the application
> > >> >> > code. Similarly when the dead node comes back other nodes have to
> > >> >> > gracefully hand over their tasks. Its not trivial to achieve
> this.
> > >> >> >
> > >> >> > There are lot of other things we have encountered while building
> > >> >> > distributed systems and we have always been able to add them to
> > Helix
> > >> >> such
> > >> >> > that other systems can benefit from it. For example, I recently
> > >> presented
> > >> >> > how to test and debug large scale distributed systems. It
> basically
> > >> comes
> > >> >> > with tools which parses zk transaction logs and provides the
> exact
> > >> >> sequence
> > >> >> > of steps that lead to a failure. More details here
> > >> >> >
> http://www.slideshare.net/KishoreGopalakrishna/data-driven-testing
> > >> >> >
> > >> >> > To summarize,
> > >> >> >
> > >> >> > So its not really zk v/s curator v/s helix. Its basically the
> > level of
> > >> >> > abstraction one wants. One can build Helix using curator which
> > uses zk
> > >> >> > underneath. So it basically boils down to what is the system you
> > are
> > >> >> > building and how complex can it get.
> > >> >> >
> > >> >> > There are definitely some use cases where Helix is not needed and
> > is
> > >> >> > probably over kill but Apache Drill looks like a project that
> will
> > get
> > >> >> > pretty big and I am sure you will see all the requirements we saw
> > over
> > >> >> > time.
> > >> >> >
> > >> >> > Hope this helps. As I mentioned earlier, i will be happy to
> provide
> > >> more
> > >> >> > details and contribute.
> > >> >> >
> > >> >> > thanks,
> > >> >> > Kishore G
> > >> >> >
> > >> >> >
> > >> >> >
> > >> >> >
> > >> >> >
> > >> >> >
> > >> >> >
> > >> >> >
> > >> >> >
> > >> >> >
> > >> >> >
> > >> >> >
> > >> >> >
> > >> >> >
> > >> >> >
> > >> >> >
> > >> >> >
> > >> >> > On Sun, Apr 21, 2013 at 1:57 AM, Michael Hausenblas <
> > >> >> > [email protected]> wrote:
> > >> >> >
> > >> >> > >
> > >> >> > > At the time I put the slides together, Helix was indeed
> > considered.
> > >> >> > > AFAICT, currently we seem to have settled on Netflix Curator
> [1],
> > >> >> > however.
> > >> >> > > I wouldn't exclude the possibility that we may utilise Helix in
> > >> future;
> > >> >> > > personally, I think it's a great thing.  Would be very
> > interested in
> > >> >> your
> > >> >> > > experiences with it (also, re Zk vs. Curator vs. Helix).
> > >> >> > >
> > >> >> > > Cheers,
> > >> >> > >                 Michael
> > >> >> > >
> > >> >> > > [1] https://github.com/Netflix/curator/wiki
> > >> >> > >
> > >> >> > > --
> > >> >> > > Michael Hausenblas
> > >> >> > > Ireland, Europe
> > >> >> > > http://mhausenblas.info/
> > >> >> > >
> > >> >> > > On 21 Apr 2013, at 08:39, kishore g <[email protected]>
> wrote:
> > >> >> > >
> > >> >> > > > Hello,
> > >> >> > > >
> > >> >> > > > I was reading the slide deck from Hadoop summit
> > >> >> > > >
> > >> >> > >
> > >> >> >
> > >> >>
> > >>
> >
> http://www.slideshare.net/Hadoop_Summit/understanding-the-value-and-architecture-of-apache-drill
> > >> >> > > >
> > >> >> > > > On slide 27, there is mention of using Helix for partition
> and
> > >> >> resource
> > >> >> > > > management. I  could not find much details on
> > >> >> > > > https://issues.apache.org/jira/browse/DRILL-53
> > >> >> > > >
> > >> >> > > > Can some one provide more details on this, we might be able
> to
> > >> >> > > contribute.
> > >> >> > > >
> > >> >> > > > thanks,
> > >> >> > > > Kishore G
> > >> >> > >
> > >> >> > >
> > >> >> >
> > >> >>
> > >>
> >
>

Reply via email to