Re: Zookeeper in Airavata to achieve reliability

Lahiru Gunathilake Tue, 17 Jun 2014 10:30:24 -0700

+1.


On Tue, Jun 17, 2014 at 1:27 PM, Supun Kamburugamuva <[email protected]>
wrote:

> Also I believe the Tool X has to do some distributed coordination among the
> GFac instances. So I think we are talking about ZooKeeper or an equivalent
> :)
>
>
> On Tue, Jun 17, 2014 at 1:12 PM, Lahiru Gunathilake <[email protected]>
> wrote:
>
> > Hi All,
> >
> > Ignoring the tool that we are going to use to implement fault tolerance I
> > have summarized the model we have decided so far. I will use the tool
> name
> > as X, we can use Zookeeper or some other implementation. Following design
> > assume tool X  and Registry have high availability.
> >
> > 1. Orchestrator and GFAC worker node communication is going to be queue
> > based and tool X is going to be used for this communication. (We have to
> > implement this with considering race condition between different gfac
> > workers).
> > 2. We are having multiple instances of GFAC which are identical (In
> future
> > we can group gfac workers). Existence of each worker node is identified
> > using X. If node goes down orchestrator will be notified by X.
> > 3. When a particular request comes and accepted by one gfac worker that
> > information will be replicated in tool X and a place where this
> information
> > is persisted even the worker failed.
> > 4. When a job comes to a final state like failed or cancelled or
> completed
> > above information will be removed. So at a given time orchestrator can
> poll
> > active jobs in each worker by giving a worker ID.
> > 5. Tool X will make sure that when a worker goes down it will notify
> > orchestrator. During a worker failure, based on step 3 and 4 orchestrator
> > can poll all the active jobs of that worker and do the same thing like in
> > step 1 (store the experiment ID to the queue) and gfac worker will pick
> the
> > jobs.
> >
> > 6. When GFAC receive a job like in step 5 it have to carefully evaluate
> the
> > state from registry and decide what to be done (If the job is pending
> then
> > gfac just have to monitor, if job state is like input transferred not
> even
> > submitted gfac has to execute rest of the chain and submit the job to the
> > resource and start monitoring).
> >
> > If we can find a tool X which supports all these features and tool itself
> > is fault tolerance and support atomicity, high availability and simply
> API
> > to implement we can use that tool.
> >
> > WDYT ?
> >
> > Lahiru
> >
> >
> > On Mon, Jun 16, 2014 at 2:38 PM, Supun Kamburugamuva <[email protected]>
> > wrote:
> >
> > > Hi Lahiru,
> > >
> > > Before moving with an implementation it may be worth to consider some
> of
> > > the following aspects as well.
> > >
> > > 1. How to report the progress of an experiment as state in ZooKeeper?
> > What
> > > happens if a GFac instance crashes while executing an experiment? Are
> > there
> > > check-points we can save so that another GFac instance can take over?
> > > 2. What is the threading model of GFac instances? (I consider this as a
> > > very important aspect)
> > > 3. What are the information needed to be stored in the ZooKeeper? You
> may
> > > need to store other information about an experiment apart from its
> > > experiment ID.
> > > 4. How to report errors?
> > > 5. For GFac weather you need a threading model or worker process model?
> > >
> > > Thanks,
> > > Supun..
> > >
> > >
> > >
> > >
> > >
> > > On Mon, Jun 16, 2014 at 2:22 PM, Lahiru Gunathilake <[email protected]
> >
> > > wrote:
> > >
> > > > Hi All,
> > > >
> > > > I think the conclusion is like this,
> > > >
> > > > 1, We make the gfac as a worker not a thrift service and we can start
> > > > multiple workers either with bunch of providers and handlers
> configured
> > > in
> > > > each worker or provider specific  workers to handle the class path
> > issues
> > > > (not the common scenario).
> > > >
> > > > 2. Gfac workers can be configured to watch for a given path in
> > zookeeper,
> > > > and multiple workers can listen to the same path. Default path can be
> > > > /airavata/gfac or can configure paths like /airavata/gfac/gsissh
> > > > /airavata/gfac/bes.
> > > >
> > > > 3. Orchestrator can configure with a logic to store experiment IDs in
> > > > zookeeper with a path, and orchestrator can be configured to provider
> > > > specific path logic too. So when a new request come orchestrator
> store
> > > the
> > > > experimentID and these experiments IDs are stored in Zk as a queue.
> > > >
> > > > 4. Since gfac workers are watching they will be notified and as supun
> > > > suggested can use a leader selection algorithm[1] and one gfac worker
> > >  will
> > > > take the leadership for each experiment. If there are gfac instances
> > for
> > > > each provider same logic will apply among those nodes with same
> > provider
> > > > type.
> > > >
> > > > [1]http://curator.apache.org/curator-recipes/leader-election.html
> > > >
> > > > I would like to implement this if there are  no objections.
> > > >
> > > > Lahiru
> > > >
> > > >
> > > > On Mon, Jun 16, 2014 at 11:51 AM, Supun Kamburugamuva <
> > [email protected]
> > > >
> > > > wrote:
> > > >
> > > > > Hi Marlon,
> > > > >
> > > > > I think you are exactly correct.
> > > > >
> > > > > Supun..
> > > > >
> > > > >
> > > > > On Mon, Jun 16, 2014 at 11:48 AM, Marlon Pierce <[email protected]>
> > > wrote:
> > > > >
> > > > > > Let me restate this, and please tell me if I'm wrong.
> > > > > >
> > > > > > Orchestrator decides (somehow) that a particular job requires
> > > JSDL/BES,
> > > > > so
> > > > > > it places the Experiment ID in Zookeeper's
> /airavata/gfac/jsdl-bes
> > > > node.
> > > > > >  GFAC servers associated with this instance notice the update.
>  The
> > > > first
> > > > > > GFAC to claim the job gets it, uses the Experiment ID to get the
> > > > detailed
> > > > > > information it needs from the Registry.  ZooKeeper handles the
> > > locking,
> > > > > etc
> > > > > > to make sure that only one GFAC at a time is trying to handle an
> > > > > experiment.
> > > > > >
> > > > > > Marlon
> > > > > >
> > > > > >
> > > > > > On 6/16/14, 11:42 AM, Lahiru Gunathilake wrote:
> > > > > >
> > > > > >> Hi Supun,
> > > > > >>
> > > > > >> Thanks for the clarification.
> > > > > >>
> > > > > >> Regards
> > > > > >> Lahiru
> > > > > >>
> > > > > >>
> > > > > >> On Mon, Jun 16, 2014 at 11:38 AM, Supun Kamburugamuva <
> > > > > [email protected]>
> > > > > >> wrote:
> > > > > >>
> > > > > >>  Hi Lahiru,
> > > > > >>>
> > > > > >>> My suggestion is that may be you don't need a Thrift service
> > > between
> > > > > >>> Orchestrator and the component executing the experiment. When a
> > new
> > > > > >>> experiment is submitted, orchestrator decides who can execute
> > this
> > > > job.
> > > > > >>> Then it put the information about this experiment execution in
> > > > > ZooKeeper.
> > > > > >>> The component which wants to executes the experiment is
> listening
> > > to
> > > > > this
> > > > > >>> ZooKeeper path and when it sees the experiment it will execute
> > it.
> > > So
> > > > > >>> that
> > > > > >>> the communication happens through an state change in ZooKeeper.
> > > This
> > > > > can
> > > > > >>> potentially simply your architecture.
> > > > > >>>
> > > > > >>> Thanks,
> > > > > >>> Supun.
> > > > > >>>
> > > > > >>>
> > > > > >>> On Mon, Jun 16, 2014 at 11:14 AM, Lahiru Gunathilake <
> > > > > [email protected]>
> > > > > >>> wrote:
> > > > > >>>
> > > > > >>>  Hi Supun,
> > > > > >>>>
> > > > > >>>> So your suggestion is to create a znode for each thrift
> service
> > we
> > > > > have
> > > > > >>>> and
> > > > > >>>> when the request comes that node gets modified with input data
> > for
> > > > > that
> > > > > >>>> request and thrift service is having a watch for that node and
> > it
> > > > will
> > > > > >>>> be
> > > > > >>>> notified because of the watch and it can read the input from
> > > > zookeeper
> > > > > >>>> and
> > > > > >>>> invoke the operation?
> > > > > >>>>
> > > > > >>>> Lahiru
> > > > > >>>>
> > > > > >>>>
> > > > > >>>> On Thu, Jun 12, 2014 at 11:50 PM, Supun Kamburugamuva <
> > > > > >>>> [email protected]>
> > > > > >>>> wrote:
> > > > > >>>>
> > > > > >>>>  Hi all,
> > > > > >>>>>
> > > > > >>>>> Here is what I think about Airavata and ZooKeeper. In
> Airavata
> > > > there
> > > > > >>>>> are
> > > > > >>>>> many components and these components must be stateless to
> > achieve
> > > > > >>>>> scalability and reliability.Also there must be a mechanism to
> > > > > >>>>>
> > > > > >>>> communicate
> > > > > >>>>
> > > > > >>>>> between the components. At the moment Airavata uses RPC calls
> > > based
> > > > > on
> > > > > >>>>> Thrift for the communication.
> > > > > >>>>>
> > > > > >>>>> ZooKeeper can be used both as a place to hold state and as a
> > > > > >>>>>
> > > > > >>>> communication
> > > > > >>>>
> > > > > >>>>> layer between the components. I'm involved with a project
> that
> > > has
> > > > > many
> > > > > >>>>> distributed components like AIravata. Right now we use Thrift
> > > > > services
> > > > > >>>>>
> > > > > >>>> to
> > > > > >>>>
> > > > > >>>>> communicate among the components. But we find it difficult to
> > use
> > > > RPC
> > > > > >>>>>
> > > > > >>>> calls
> > > > > >>>>
> > > > > >>>>> and achieve stateless behaviour and thinking of replacing
> > Thrift
> > > > > >>>>>
> > > > > >>>> services
> > > > > >>>>
> > > > > >>>>> with ZooKeeper based communication layer. So I think it is
> > better
> > > > to
> > > > > >>>>> explore the possibility of removing the Thrift services
> between
> > > the
> > > > > >>>>> components and use ZooKeeper as a communication mechanism
> > between
> > > > the
> > > > > >>>>> services. If you do this you will have to move the state to
> > > > ZooKeeper
> > > > > >>>>>
> > > > > >>>> and
> > > > > >>>>
> > > > > >>>>> will automatically achieve the stateless behaviour in the
> > > > components.
> > > > > >>>>>
> > > > > >>>>> Also I think trying to make ZooKeeper optional is a bad idea.
> > If
> > > we
> > > > > are
> > > > > >>>>> trying to integrate something fundamentally important to
> > > > architecture
> > > > > >>>>> as
> > > > > >>>>> how to store state, we shouldn't make it optional.
> > > > > >>>>>
> > > > > >>>>> Thanks,
> > > > > >>>>> Supun..
> > > > > >>>>>
> > > > > >>>>>
> > > > > >>>>> On Thu, Jun 12, 2014 at 10:57 PM, Shameera Rathnayaka <
> > > > > >>>>> [email protected]> wrote:
> > > > > >>>>>
> > > > > >>>>>  Hi Lahiru,
> > > > > >>>>>>
> > > > > >>>>>> As i understood,  not only reliability , you are trying to
> > > achieve
> > > > > >>>>>> some
> > > > > >>>>>> other requirement by introducing zookeeper, like health
> > > monitoring
> > > > > of
> > > > > >>>>>>
> > > > > >>>>> the
> > > > > >>>>
> > > > > >>>>> services, categorization with service implementation etc ...
> .
> > In
> > > > > that
> > > > > >>>>>> case, i think we can get use of zookeeper's features but if
> we
> > > > only
> > > > > >>>>>>
> > > > > >>>>> focus
> > > > > >>>>
> > > > > >>>>> on reliability, i have little bit of concern, why can't we
> use
> > > > > >>>>>>
> > > > > >>>>> clustering +
> > > > > >>>>
> > > > > >>>>> LB ?
> > > > > >>>>>>
> > > > > >>>>>> Yes it is better we add Zookeeper as a prerequisite if user
> > need
> > > > to
> > > > > >>>>>> use
> > > > > >>>>>> it.
> > > > > >>>>>>
> > > > > >>>>>> Thanks,
> > > > > >>>>>>   Shameera.
> > > > > >>>>>>
> > > > > >>>>>>
> > > > > >>>>>> On Thu, Jun 12, 2014 at 5:19 AM, Lahiru Gunathilake <
> > > > > >>>>>> [email protected]
> > > > > >>>>>> wrote:
> > > > > >>>>>>
> > > > > >>>>>>  Hi Gagan,
> > > > > >>>>>>>
> > > > > >>>>>>> I need to start another discussion about it, but I had an
> > > offline
> > > > > >>>>>>> discussion with Suresh about auto-scaling. I will start
> > another
> > > > > >>>>>>> thread
> > > > > >>>>>>> about this topic too.
> > > > > >>>>>>>
> > > > > >>>>>>> Regards
> > > > > >>>>>>> Lahiru
> > > > > >>>>>>>
> > > > > >>>>>>>
> > > > > >>>>>>> On Wed, Jun 11, 2014 at 4:10 PM, Gagan Juneja <
> > > > > >>>>>>>
> > > > > >>>>>> [email protected]
> > > > > >>>>
> > > > > >>>>> wrote:
> > > > > >>>>>>>
> > > > > >>>>>>>  Thanks Lahiru for pointing to nice library, added to my
> > > > dictionary
> > > > > >>>>>>>>
> > > > > >>>>>>> :).
> > > > > >>>>
> > > > > >>>>>  I would like to know how are we planning to start multiple
> > > > servers.
> > > > > >>>>>>>> 1. Spawning new servers based on load? Some times we call
> it
> > > as
> > > > > auto
> > > > > >>>>>>>> scalable.
> > > > > >>>>>>>> 2. To make some specific number of nodes available such as
> > we
> > > > > want 2
> > > > > >>>>>>>> servers to be available at any time so if one goes down
> > then I
> > > > > need
> > > > > >>>>>>>>
> > > > > >>>>>>> to
> > > > > >>>>
> > > > > >>>>>  spawn one new to make available servers count 2.
> > > > > >>>>>>>> 3. Initially start all the servers.
> > > > > >>>>>>>>
> > > > > >>>>>>>> In scenario 1 and 2 zookeeper does make sense but I don't
> > > > believe
> > > > > >>>>>>>>
> > > > > >>>>>>> existing
> > > > > >>>>>>>
> > > > > >>>>>>>> architecture support this?
> > > > > >>>>>>>>
> > > > > >>>>>>>> Regards,
> > > > > >>>>>>>> Gagan
> > > > > >>>>>>>> On 12-Jun-2014 1:19 am, "Lahiru Gunathilake" <
> > > [email protected]
> > > > >
> > > > > >>>>>>>>
> > > > > >>>>>>> wrote:
> > > > > >>>>>>>
> > > > > >>>>>>>> Hi Gagan,
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> Thanks for your response. Please see my inline comments.
> > > > > >>>>>>>>>
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> On Wed, Jun 11, 2014 at 3:37 PM, Gagan Juneja <
> > > > > >>>>>>>>>
> > > > > >>>>>>>> [email protected]>
> > > > > >>>>>>>
> > > > > >>>>>>>> wrote:
> > > > > >>>>>>>>>
> > > > > >>>>>>>>>  Hi Lahiru,
> > > > > >>>>>>>>>> Just my 2 cents.
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>> I am big fan of zookeeper but also against adding
> multiple
> > > > hops
> > > > > in
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>> the
> > > > > >>>>>>>
> > > > > >>>>>>>> system which can add unnecessary complexity. Here I am not
> > > able
> > > > to
> > > > > >>>>>>>>>> understand the requirement of zookeeper may be I am
> wrong
> > > > > because
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>> of
> > > > > >>>>
> > > > > >>>>> less
> > > > > >>>>>>>
> > > > > >>>>>>>> knowledge of the airavata system in whole. So I would like
> > to
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>> discuss
> > > > > >>>>
> > > > > >>>>>  following point.
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>> 1. How it will help us in making system more reliable.
> > > > Zookeeper
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>> is
> > > > > >>>>
> > > > > >>>>> not
> > > > > >>>>>>>
> > > > > >>>>>>>> able to restart services. At max it can tell whether
> service
> > > is
> > > > up
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>> or not
> > > > > >>>>>>>
> > > > > >>>>>>>> which could only be the case if airavata service goes down
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>> gracefully and
> > > > > >>>>>>>
> > > > > >>>>>>>> we have any automated way to restart it. If this is just
> > > matter
> > > > of
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>> routing
> > > > > >>>>>>>
> > > > > >>>>>>>> client requests to the available thrift servers then this
> > can
> > > be
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>> achieved
> > > > > >>>>>>>
> > > > > >>>>>>>> with the help of load balancer which I guess is already
> > there
> > > in
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>> thrift
> > > > > >>>>>>>
> > > > > >>>>>>>> wish list.
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>>  We have multiple thrift services and currently we start
> > > only
> > > > > one
> > > > > >>>>>>>>>
> > > > > >>>>>>>> instance
> > > > > >>>>>>>
> > > > > >>>>>>>> of them and each thrift service is a stateless service. To
> > > keep
> > > > > the
> > > > > >>>>>>>>>
> > > > > >>>>>>>> high
> > > > > >>>>>>>
> > > > > >>>>>>>> availability we have to start multiple instances of them
> in
> > > > > >>>>>>>>>
> > > > > >>>>>>>> production
> > > > > >>>>
> > > > > >>>>>  scenario. So for clients to get an available thrift service
> we
> > > can
> > > > > >>>>>>>>>
> > > > > >>>>>>>> use
> > > > > >>>>
> > > > > >>>>>  zookeeper znodes to represent each available service. There
> > are
> > > > > >>>>>>>>>
> > > > > >>>>>>>> some
> > > > > >>>>
> > > > > >>>>>  libraries which is doing similar[1] and I think we can use
> > them
> > > > > >>>>>>>>>
> > > > > >>>>>>>> directly.
> > > > > >>>>>>>
> > > > > >>>>>>>> 2. As far as registering of different providers is
> concerned
> > > do
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>> you
> > > > > >>>>
> > > > > >>>>>  think for that we really need external store.
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>>  Yes I think so, because its light weight and reliable
> and
> > > we
> > > > > have
> > > > > >>>>>>>>>
> > > > > >>>>>>>> to
> > > > > >>>>
> > > > > >>>>> do
> > > > > >>>>>>>
> > > > > >>>>>>>> very minimal amount of work to achieve all these features
> to
> > > > > >>>>>>>>>
> > > > > >>>>>>>> Airavata
> > > > > >>>>
> > > > > >>>>>  because zookeeper handle all the complexity.
> > > > > >>>>>>>>>
> > > > > >>>>>>>>>  I have seen people using zookeeper more for state
> > management
> > > > in
> > > > > >>>>>>>>>> distributed environments.
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>>  +1, we might not be the most effective users of
> zookeeper
> > > > > because
> > > > > >>>>>>>>>
> > > > > >>>>>>>> all
> > > > > >>>>
> > > > > >>>>> of
> > > > > >>>>>>>
> > > > > >>>>>>>> our services are stateless services, but my point is to
> > > achieve
> > > > > >>>>>>>>> fault-tolerance we can use zookeeper and with minimal
> work.
> > > > > >>>>>>>>>
> > > > > >>>>>>>>>    I would like to understand more how can we leverage
> > > > zookeeper
> > > > > in
> > > > > >>>>>>>>>> airavata to make system reliable.
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>>  [1]https://github.com/eirslett/thrift-zookeeper
> > > > > >>>>>>>>>
> > > > > >>>>>>>>>
> > > > > >>>>>>>>>
> > > > > >>>>>>>>>  Regards,
> > > > > >>>>>>>>>> Gagan
> > > > > >>>>>>>>>> On 12-Jun-2014 12:33 am, "Marlon Pierce" <
> [email protected]
> > >
> > > > > wrote:
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>>  Thanks for the summary, Lahiru. I'm cc'ing the
> > Architecture
> > > > > list
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>> for
> > > > > >>>>
> > > > > >>>>>  additional comments.
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>> Marlon
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>> On 6/11/14 2:27 PM, Lahiru Gunathilake wrote:
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>>> Hi All,
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>>> I did little research about Apache Zookeeper[1] and
> how
> > to
> > > > use
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>> it
> > > > > >>>>
> > > > > >>>>> in
> > > > > >>>>>>>
> > > > > >>>>>>>>  airavata. Its really a nice way to achieve fault
> tolerance
> > > and
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>> reliable
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>>> communication between our thrift services and clients.
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>> Zookeeper
> > > > > >>>>
> > > > > >>>>> is a
> > > > > >>>>>>>
> > > > > >>>>>>>>  distributed, fault tolerant system to do a reliable
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>> communication
> > > > > >>>>
> > > > > >>>>>  between
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>>> distributed applications. This is like an in-memory
> file
> > > > > system
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>> which
> > > > > >>>>>>>
> > > > > >>>>>>>>  has
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>>> nodes in a tree structure and each node can have small
> > > > amount
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>> of
> > > > > >>>>
> > > > > >>>>> data
> > > > > >>>>>>>
> > > > > >>>>>>>>  associated with it and these nodes are called znodes.
> > Clients
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>> can
> > > > > >>>>
> > > > > >>>>>  connect
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>>> to a zookeeper server and add/delete and update these
> > > > znodes.
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>>>    In Apache Airavata we start multiple thrift
> services
> > > and
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>> these
> > > > > >>>>
> > > > > >>>>> can
> > > > > >>>>>>>
> > > > > >>>>>>>>  go
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>>> down for maintenance or these can crash, if we use
> > > zookeeper
> > > > > to
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>> store
> > > > > >>>>>>>
> > > > > >>>>>>>>  these
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>>> configuration(thrift service configurations) we can
> > > achieve
> > > > a
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>> very
> > > > > >>>>
> > > > > >>>>>  reliable
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>>> system. Basically thrift clients can dynamically
> > discover
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>> available
> > > > > >>>>>>>
> > > > > >>>>>>>>  service
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>>> by using ephemeral znodes(Here we do not have to
> change
> > > the
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>> generated
> > > > > >>>>>>>
> > > > > >>>>>>>>  thrift client code but we have to change the locations we
> > are
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>> invoking
> > > > > >>>>>>>
> > > > > >>>>>>>>  them). ephemeral znodes will be removed when the thrift
> > > service
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>> goes
> > > > > >>>>>>>
> > > > > >>>>>>>>  down
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>>> and zookeeper guarantee the atomicity between these
> > > > > operations.
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>> With
> > > > > >>>>>>>
> > > > > >>>>>>>>  this
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>>> approach we can have a node hierarchy for multiple of
> > > > > airavata,
> > > > > >>>>>>>>>>>> orchestrator,appcatalog and gfac thrift services.
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>>> For specifically for gfac we can have different types
> of
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>> services
> > > > > >>>>
> > > > > >>>>> for
> > > > > >>>>>>>
> > > > > >>>>>>>>  each
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>>> provider implementation. This can be achieved by using
> > the
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>> hierarchical
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>>> support in zookeeper and providing some logic in
> > > gfac-thrift
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>> service
> > > > > >>>>>>>
> > > > > >>>>>>>>  to
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>>> register it to a defined path. Using the same logic
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>> orchestrator
> > > > > >>>>
> > > > > >>>>> can
> > > > > >>>>>>>
> > > > > >>>>>>>>  discover the provider specific gfac thrift service and
> > route
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>> the
> > > > > >>>>
> > > > > >>>>>  message to
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>>> the correct thrift service.
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>>> With this approach I think we simply have write some
> > > client
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>> code
> > > > > >>>>
> > > > > >>>>> in
> > > > > >>>>>>>
> > > > > >>>>>>>>  thrift
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>>> services and clients and zookeeper server installation
> > can
> > > > be
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>> done as
> > > > > >>>>>>>
> > > > > >>>>>>>>  a
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>>> separate process and it will be easier to keep the
> > > Zookeeper
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>> server
> > > > > >>>>>>>
> > > > > >>>>>>>>  separate from Airavata because installation of Zookeeper
> > > server
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>> little
> > > > > >>>>>>>
> > > > > >>>>>>>>  complex in production scenario. I think we have to make
> > sure
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>> everything
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>>> works fine when there is no Zookeeper running, ex:
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>> enable.zookeeper=false
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>>> should works fine and users doesn't have to download
> and
> > > > start
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>> zookeeper.
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>>> [1]http://zookeeper.apache.org/
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>>> Thanks
> > > > > >>>>>>>>>>>> Lahiru
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>> --
> > > > > >>>>>>>>> System Analyst Programmer
> > > > > >>>>>>>>> PTI Lab
> > > > > >>>>>>>>> Indiana University
> > > > > >>>>>>>>>
> > > > > >>>>>>>>>
> > > > > >>>>>>> --
> > > > > >>>>>>> System Analyst Programmer
> > > > > >>>>>>> PTI Lab
> > > > > >>>>>>> Indiana University
> > > > > >>>>>>>
> > > > > >>>>>>>
> > > > > >>>>>>
> > > > > >>>>>> --
> > > > > >>>>>> Best Regards,
> > > > > >>>>>> Shameera Rathnayaka.
> > > > > >>>>>>
> > > > > >>>>>> email: shameera AT apache.org , shameerainfo AT gmail.com
> > > > > >>>>>> Blog : http://shameerarathnayaka.blogspot.com/
> > > > > >>>>>>
> > > > > >>>>>>
> > > > > >>>>>
> > > > > >>>>> --
> > > > > >>>>> Supun Kamburugamuva
> > > > > >>>>> Member, Apache Software Foundation; http://www.apache.org
> > > > > >>>>> E-mail: [email protected];  Mobile: +1 812 369 6762
> > > > > >>>>> Blog: http://supunk.blogspot.com
> > > > > >>>>>
> > > > > >>>>>
> > > > > >>>>>
> > > > > >>>> --
> > > > > >>>> System Analyst Programmer
> > > > > >>>> PTI Lab
> > > > > >>>> Indiana University
> > > > > >>>>
> > > > > >>>>
> > > > > >>>
> > > > > >>> --
> > > > > >>> Supun Kamburugamuva
> > > > > >>> Member, Apache Software Foundation; http://www.apache.org
> > > > > >>> E-mail: [email protected];  Mobile: +1 812 369 6762
> > > > > >>> Blog: http://supunk.blogspot.com
> > > > > >>>
> > > > > >>>
> > > > > >>>
> > > > > >>
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Supun Kamburugamuva
> > > > > Member, Apache Software Foundation; http://www.apache.org
> > > > > E-mail: [email protected];  Mobile: +1 812 369 6762
> > > > > Blog: http://supunk.blogspot.com
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > System Analyst Programmer
> > > > PTI Lab
> > > > Indiana University
> > > >
> > >
> > >
> > >
> > > --
> > > Supun Kamburugamuva
> > > Member, Apache Software Foundation; http://www.apache.org
> > > E-mail: [email protected];  Mobile: +1 812 369 6762
> > > Blog: http://supunk.blogspot.com
> > >
> >
> >
> >
> > --
> > System Analyst Programmer
> > PTI Lab
> > Indiana University
> >
>
>
>
> --
> Supun Kamburugamuva
> Member, Apache Software Foundation; http://www.apache.org
> E-mail: [email protected];  Mobile: +1 812 369 6762
> Blog: http://supunk.blogspot.com
>



-- 
System Analyst Programmer
PTI Lab
Indiana University

Re: Zookeeper in Airavata to achieve reliability

Reply via email to