Re: Zookeeper in Airavata to achieve reliability

Lahiru Gunathilake Wed, 18 Jun 2014 09:12:33 -0700

Hi Eran,


On Tue, Jun 17, 2014 at 4:06 PM, Eran Chinthaka Withana <
[email protected]> wrote:

> Storm has a Kafka spout which manages the cursor location (pointer to the
> head of the queue representing the next message to be processed) inside ZK.
> Each storm spout instance uses this information to get the next item to
> process. Storm kafka spout won't advance to the next message until it gets
> an ack from the storm topology.
>
If we have 10 jobs in the queue and 5 GFAC instances picked 1 at a time and
successfully submitted and have to start taking rest of the jobs. But all 5
GFAC instances are responsible for initially picked  5 jobs because they
are still running and gfac instances are monitoring them until its done but
at the same time we have to move the cursor to pick other jobs too.

If we Ack and moved the cursor just after submission without waiting until
the job is actually finished how are we going to know which gfac is
monitoring which set of jobs ?

I am not getting how achieve above requirement with this suggestion. May be
I am missing something here.

Regards
Lahiru

>
> So, if there is an exception in the topology and ack is sent only by the
> last bolt, then storm bolt make sure all messages are processed since
> exceptions won't generate acks.
>
> Thanks,
> Eran Chinthaka Withana
>
>
> On Tue, Jun 17, 2014 at 12:30 PM, Lahiru Gunathilake <[email protected]>
> wrote:
>
> > Hi Eran,
> >
> > I think I should take back my last email. When I carefully look at storm
> I
> > have following question.
> >
> > How are we going to store the Job statuses  and relaunch the jobs which
> was
> > running in failure nodes ? Its true that storm is starting new workers
> but
> > there should be a way to find missing jobs by someone in the system.
> Since
> > we are not having a data stream there is no use to start new workers
> unless
> > we handler the missing jobs. I think we need to have a better control of
> > our component and persist the states of jobs each GFAC node is handling.
> > Directly using zookeeper will let us to do a proper fault tolerance
> > implementation.
> >
> > Regards
> > Lahiru
> >
> >
> >
> > On Tue, Jun 17, 2014 at 3:14 PM, Lahiru Gunathilake <[email protected]>
> > wrote:
> >
> > > Hi Supun,
> > >
> > > I think in this usecase we only use storm topology to do the
> > communication
> > > among workers and we are completely ignoring the stream processing
> part.
> > > Orchestrator will talk to Nimbus and GFAC nodes will be Worker nodes in
> > the
> > > storm topology. But I think we can achieve extremely fault tolerance
> > system
> > > by directly using storm based on following statement in storm site with
> > > minimum changes in airavata.
> > >
> > > Additionally, the Nimbus daemon and Supervisor daemons are fail-fast
> and
> > > stateless; all state is kept in Zookeeper or on local disk. This means
> > you
> > > can kill -9 Nimbus or the Supervisors and they’ll start back up like
> > > nothing happened. This design leads to Storm clusters being incredibly
> > > stable.
> > >
> > >
> > >
> > >
> > > On Tue, Jun 17, 2014 at 3:02 PM, Supun Kamburugamuva <
> [email protected]>
> > > wrote:
> > >
> > >> Hi Eran,
> > >>
> > >> I'm using Storm every day and this is one of the strangest things I've
> > >> heard about using Storm. My be there are more use cases for Storm
> other
> > >> than Distributed Stream processing. AFAIK the Bolts, spouts are built
> to
> > >> handle a stream of events that doesn't take much time to process. In
> > >> Airavata we don't process the messages. Instead we run experiments
> based
> > >> on
> > >> the commands given.
> > >>
> > >> If you want process isolation, distributed execution, cluster resource
> > >> management Yarn would be a better thing to explore.
> > >>
> > >> Thanks,
> > >> Supun..
> > >>
> > >>
> > >> On Tue, Jun 17, 2014 at 2:27 PM, Eran Chinthaka Withana <
> > >> [email protected]> wrote:
> > >>
> > >> > Hi Lahiru,
> > >> >
> > >> > good summarization. Thanks Lahiru.
> > >> >
> > >> > I think you are trying to stick to a model where Orchestrator
> > >> distributing
> > >> > to work for GFac worker and trying to do the impedance mismatch
> > through
> > >> a
> > >> > messaging solution. If you step back and think, we don't even want
> the
> > >> > orchestrator to handle everything. From its point of view, it should
> > >> submit
> > >> > jobs to the framework, and will wait or get notified once the job is
> > >> done.
> > >> >
> > >> > There are multiple ways of doing this. And here is one method.
> > >> >
> > >> > Orchestrator submits all its jobs to Job queue (implemented using
> any
> > MQ
> > >> > impl like Rabbit or Kafka). A storm topology is implemented to
> dequeue
> > >> > messages, process them (i.e. submit those jobs and get those
> executed)
> > >> and
> > >> > notify the Orchestrator with the status (either through another
> > >> > JobCompletionQueue or direct invocation).
> > >> >
> > >> > With this approach, the MQ provider will help to match impedance
> > between
> > >> > job submission and consumption. Storm helps with worker
> coordination,
> > >> load
> > >> > balancing, throttling on your job execution framework, worker pool
> > >> > management and fault tolerance.
> > >> >
> > >> > Of course, you can implement this based only on ZK and handle
> > everything
> > >> > else on your own but storm had done exactly that with the use of ZK
> > >> > underneath.
> > >> >
> > >> > Finally, if you go for a model like this, then even beyond job
> > >> submission,
> > >> > you can use the same model to do anything within the framework for
> > >> internal
> > >> > communication. For example, the workflow engine will submit its jobs
> > to
> > >> > queues based on what it has to do. Storm topologies exists for each
> > >> queues
> > >> > to dequeue messages and carry out the work in a reliable manner.
> > >> Consider
> > >> > this as mini-workflows within a larger workflow framework.
> > >> >
> > >> > We can have a voice chat if its more convenient. But not at 7am PST
> :)
> > >> >
> > >> >
> > >> > Thanks,
> > >> > Eran Chinthaka Withana
> > >> >
> > >> >
> > >> > On Tue, Jun 17, 2014 at 10:12 AM, Lahiru Gunathilake <
> > [email protected]
> > >> >
> > >> > wrote:
> > >> >
> > >> > > Hi All,
> > >> > >
> > >> > > Ignoring the tool that we are going to use to implement fault
> > >> tolerance I
> > >> > > have summarized the model we have decided so far. I will use the
> > tool
> > >> > name
> > >> > > as X, we can use Zookeeper or some other implementation. Following
> > >> design
> > >> > > assume tool X  and Registry have high availability.
> > >> > >
> > >> > > 1. Orchestrator and GFAC worker node communication is going to be
> > >> queue
> > >> > > based and tool X is going to be used for this communication. (We
> > have
> > >> to
> > >> > > implement this with considering race condition between different
> > gfac
> > >> > > workers).
> > >> > > 2. We are having multiple instances of GFAC which are identical
> (In
> > >> > future
> > >> > > we can group gfac workers). Existence of each worker node is
> > >> identified
> > >> > > using X. If node goes down orchestrator will be notified by X.
> > >> > > 3. When a particular request comes and accepted by one gfac worker
> > >> that
> > >> > > information will be replicated in tool X and a place where this
> > >> > information
> > >> > > is persisted even the worker failed.
> > >> > > 4. When a job comes to a final state like failed or cancelled or
> > >> > completed
> > >> > > above information will be removed. So at a given time orchestrator
> > can
> > >> > poll
> > >> > > active jobs in each worker by giving a worker ID.
> > >> > > 5. Tool X will make sure that when a worker goes down it will
> notify
> > >> > > orchestrator. During a worker failure, based on step 3 and 4
> > >> orchestrator
> > >> > > can poll all the active jobs of that worker and do the same thing
> > >> like in
> > >> > > step 1 (store the experiment ID to the queue) and gfac worker will
> > >> pick
> > >> > the
> > >> > > jobs.
> > >> > >
> > >> > > 6. When GFAC receive a job like in step 5 it have to carefully
> > >> evaluate
> > >> > the
> > >> > > state from registry and decide what to be done (If the job is
> > pending
> > >> > then
> > >> > > gfac just have to monitor, if job state is like input transferred
> > not
> > >> > even
> > >> > > submitted gfac has to execute rest of the chain and submit the job
> > to
> > >> the
> > >> > > resource and start monitoring).
> > >> > >
> > >> > > If we can find a tool X which supports all these features and tool
> > >> itself
> > >> > > is fault tolerance and support atomicity, high availability and
> > simply
> > >> > API
> > >> > > to implement we can use that tool.
> > >> > >
> > >> > > WDYT ?
> > >> > >
> > >> > > Lahiru
> > >> > >
> > >> > >
> > >> > > On Mon, Jun 16, 2014 at 2:38 PM, Supun Kamburugamuva <
> > >> [email protected]>
> > >> > > wrote:
> > >> > >
> > >> > > > Hi Lahiru,
> > >> > > >
> > >> > > > Before moving with an implementation it may be worth to consider
> > >> some
> > >> > of
> > >> > > > the following aspects as well.
> > >> > > >
> > >> > > > 1. How to report the progress of an experiment as state in
> > >> ZooKeeper?
> > >> > > What
> > >> > > > happens if a GFac instance crashes while executing an
> experiment?
> > >> Are
> > >> > > there
> > >> > > > check-points we can save so that another GFac instance can take
> > >> over?
> > >> > > > 2. What is the threading model of GFac instances? (I consider
> this
> > >> as a
> > >> > > > very important aspect)
> > >> > > > 3. What are the information needed to be stored in the
> ZooKeeper?
> > >> You
> > >> > may
> > >> > > > need to store other information about an experiment apart from
> its
> > >> > > > experiment ID.
> > >> > > > 4. How to report errors?
> > >> > > > 5. For GFac weather you need a threading model or worker process
> > >> model?
> > >> > > >
> > >> > > > Thanks,
> > >> > > > Supun..
> > >> > > >
> > >> > > >
> > >> > > >
> > >> > > >
> > >> > > >
> > >> > > > On Mon, Jun 16, 2014 at 2:22 PM, Lahiru Gunathilake <
> > >> [email protected]
> > >> > >
> > >> > > > wrote:
> > >> > > >
> > >> > > > > Hi All,
> > >> > > > >
> > >> > > > > I think the conclusion is like this,
> > >> > > > >
> > >> > > > > 1, We make the gfac as a worker not a thrift service and we
> can
> > >> start
> > >> > > > > multiple workers either with bunch of providers and handlers
> > >> > configured
> > >> > > > in
> > >> > > > > each worker or provider specific  workers to handle the class
> > path
> > >> > > issues
> > >> > > > > (not the common scenario).
> > >> > > > >
> > >> > > > > 2. Gfac workers can be configured to watch for a given path in
> > >> > > zookeeper,
> > >> > > > > and multiple workers can listen to the same path. Default path
> > >> can be
> > >> > > > > /airavata/gfac or can configure paths like
> /airavata/gfac/gsissh
> > >> > > > > /airavata/gfac/bes.
> > >> > > > >
> > >> > > > > 3. Orchestrator can configure with a logic to store experiment
> > >> IDs in
> > >> > > > > zookeeper with a path, and orchestrator can be configured to
> > >> provider
> > >> > > > > specific path logic too. So when a new request come
> orchestrator
> > >> > store
> > >> > > > the
> > >> > > > > experimentID and these experiments IDs are stored in Zk as a
> > >> queue.
> > >> > > > >
> > >> > > > > 4. Since gfac workers are watching they will be notified and
> as
> > >> supun
> > >> > > > > suggested can use a leader selection algorithm[1] and one gfac
> > >> worker
> > >> > > >  will
> > >> > > > > take the leadership for each experiment. If there are gfac
> > >> instances
> > >> > > for
> > >> > > > > each provider same logic will apply among those nodes with
> same
> > >> > > provider
> > >> > > > > type.
> > >> > > > >
> > >> > > > > [1]
> > http://curator.apache.org/curator-recipes/leader-election.html
> > >> > > > >
> > >> > > > > I would like to implement this if there are  no objections.
> > >> > > > >
> > >> > > > > Lahiru
> > >> > > > >
> > >> > > > >
> > >> > > > > On Mon, Jun 16, 2014 at 11:51 AM, Supun Kamburugamuva <
> > >> > > [email protected]
> > >> > > > >
> > >> > > > > wrote:
> > >> > > > >
> > >> > > > > > Hi Marlon,
> > >> > > > > >
> > >> > > > > > I think you are exactly correct.
> > >> > > > > >
> > >> > > > > > Supun..
> > >> > > > > >
> > >> > > > > >
> > >> > > > > > On Mon, Jun 16, 2014 at 11:48 AM, Marlon Pierce <
> > >> [email protected]>
> > >> > > > wrote:
> > >> > > > > >
> > >> > > > > > > Let me restate this, and please tell me if I'm wrong.
> > >> > > > > > >
> > >> > > > > > > Orchestrator decides (somehow) that a particular job
> > requires
> > >> > > > JSDL/BES,
> > >> > > > > > so
> > >> > > > > > > it places the Experiment ID in Zookeeper's
> > >> > /airavata/gfac/jsdl-bes
> > >> > > > > node.
> > >> > > > > > >  GFAC servers associated with this instance notice the
> > update.
> > >> >  The
> > >> > > > > first
> > >> > > > > > > GFAC to claim the job gets it, uses the Experiment ID to
> get
> > >> the
> > >> > > > > detailed
> > >> > > > > > > information it needs from the Registry.  ZooKeeper handles
> > the
> > >> > > > locking,
> > >> > > > > > etc
> > >> > > > > > > to make sure that only one GFAC at a time is trying to
> > handle
> > >> an
> > >> > > > > > experiment.
> > >> > > > > > >
> > >> > > > > > > Marlon
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > > > On 6/16/14, 11:42 AM, Lahiru Gunathilake wrote:
> > >> > > > > > >
> > >> > > > > > >> Hi Supun,
> > >> > > > > > >>
> > >> > > > > > >> Thanks for the clarification.
> > >> > > > > > >>
> > >> > > > > > >> Regards
> > >> > > > > > >> Lahiru
> > >> > > > > > >>
> > >> > > > > > >>
> > >> > > > > > >> On Mon, Jun 16, 2014 at 11:38 AM, Supun Kamburugamuva <
> > >> > > > > > [email protected]>
> > >> > > > > > >> wrote:
> > >> > > > > > >>
> > >> > > > > > >>  Hi Lahiru,
> > >> > > > > > >>>
> > >> > > > > > >>> My suggestion is that may be you don't need a Thrift
> > service
> > >> > > > between
> > >> > > > > > >>> Orchestrator and the component executing the experiment.
> > >> When a
> > >> > > new
> > >> > > > > > >>> experiment is submitted, orchestrator decides who can
> > >> execute
> > >> > > this
> > >> > > > > job.
> > >> > > > > > >>> Then it put the information about this experiment
> > execution
> > >> in
> > >> > > > > > ZooKeeper.
> > >> > > > > > >>> The component which wants to executes the experiment is
> > >> > listening
> > >> > > > to
> > >> > > > > > this
> > >> > > > > > >>> ZooKeeper path and when it sees the experiment it will
> > >> execute
> > >> > > it.
> > >> > > > So
> > >> > > > > > >>> that
> > >> > > > > > >>> the communication happens through an state change in
> > >> ZooKeeper.
> > >> > > > This
> > >> > > > > > can
> > >> > > > > > >>> potentially simply your architecture.
> > >> > > > > > >>>
> > >> > > > > > >>> Thanks,
> > >> > > > > > >>> Supun.
> > >> > > > > > >>>
> > >> > > > > > >>>
> > >> > > > > > >>> On Mon, Jun 16, 2014 at 11:14 AM, Lahiru Gunathilake <
> > >> > > > > > [email protected]>
> > >> > > > > > >>> wrote:
> > >> > > > > > >>>
> > >> > > > > > >>>  Hi Supun,
> > >> > > > > > >>>>
> > >> > > > > > >>>> So your suggestion is to create a znode for each thrift
> > >> > service
> > >> > > we
> > >> > > > > > have
> > >> > > > > > >>>> and
> > >> > > > > > >>>> when the request comes that node gets modified with
> input
> > >> data
> > >> > > for
> > >> > > > > > that
> > >> > > > > > >>>> request and thrift service is having a watch for that
> > node
> > >> and
> > >> > > it
> > >> > > > > will
> > >> > > > > > >>>> be
> > >> > > > > > >>>> notified because of the watch and it can read the input
> > >> from
> > >> > > > > zookeeper
> > >> > > > > > >>>> and
> > >> > > > > > >>>> invoke the operation?
> > >> > > > > > >>>>
> > >> > > > > > >>>> Lahiru
> > >> > > > > > >>>>
> > >> > > > > > >>>>
> > >> > > > > > >>>> On Thu, Jun 12, 2014 at 11:50 PM, Supun Kamburugamuva <
> > >> > > > > > >>>> [email protected]>
> > >> > > > > > >>>> wrote:
> > >> > > > > > >>>>
> > >> > > > > > >>>>  Hi all,
> > >> > > > > > >>>>>
> > >> > > > > > >>>>> Here is what I think about Airavata and ZooKeeper. In
> > >> > Airavata
> > >> > > > > there
> > >> > > > > > >>>>> are
> > >> > > > > > >>>>> many components and these components must be stateless
> > to
> > >> > > achieve
> > >> > > > > > >>>>> scalability and reliability.Also there must be a
> > >> mechanism to
> > >> > > > > > >>>>>
> > >> > > > > > >>>> communicate
> > >> > > > > > >>>>
> > >> > > > > > >>>>> between the components. At the moment Airavata uses
> RPC
> > >> calls
> > >> > > > based
> > >> > > > > > on
> > >> > > > > > >>>>> Thrift for the communication.
> > >> > > > > > >>>>>
> > >> > > > > > >>>>> ZooKeeper can be used both as a place to hold state
> and
> > >> as a
> > >> > > > > > >>>>>
> > >> > > > > > >>>> communication
> > >> > > > > > >>>>
> > >> > > > > > >>>>> layer between the components. I'm involved with a
> > project
> > >> > that
> > >> > > > has
> > >> > > > > > many
> > >> > > > > > >>>>> distributed components like AIravata. Right now we use
> > >> Thrift
> > >> > > > > > services
> > >> > > > > > >>>>>
> > >> > > > > > >>>> to
> > >> > > > > > >>>>
> > >> > > > > > >>>>> communicate among the components. But we find it
> > >> difficult to
> > >> > > use
> > >> > > > > RPC
> > >> > > > > > >>>>>
> > >> > > > > > >>>> calls
> > >> > > > > > >>>>
> > >> > > > > > >>>>> and achieve stateless behaviour and thinking of
> > replacing
> > >> > > Thrift
> > >> > > > > > >>>>>
> > >> > > > > > >>>> services
> > >> > > > > > >>>>
> > >> > > > > > >>>>> with ZooKeeper based communication layer. So I think
> it
> > is
> > >> > > better
> > >> > > > > to
> > >> > > > > > >>>>> explore the possibility of removing the Thrift
> services
> > >> > between
> > >> > > > the
> > >> > > > > > >>>>> components and use ZooKeeper as a communication
> > mechanism
> > >> > > between
> > >> > > > > the
> > >> > > > > > >>>>> services. If you do this you will have to move the
> state
> > >> to
> > >> > > > > ZooKeeper
> > >> > > > > > >>>>>
> > >> > > > > > >>>> and
> > >> > > > > > >>>>
> > >> > > > > > >>>>> will automatically achieve the stateless behaviour in
> > the
> > >> > > > > components.
> > >> > > > > > >>>>>
> > >> > > > > > >>>>> Also I think trying to make ZooKeeper optional is a
> bad
> > >> idea.
> > >> > > If
> > >> > > > we
> > >> > > > > > are
> > >> > > > > > >>>>> trying to integrate something fundamentally important
> to
> > >> > > > > architecture
> > >> > > > > > >>>>> as
> > >> > > > > > >>>>> how to store state, we shouldn't make it optional.
> > >> > > > > > >>>>>
> > >> > > > > > >>>>> Thanks,
> > >> > > > > > >>>>> Supun..
> > >> > > > > > >>>>>
> > >> > > > > > >>>>>
> > >> > > > > > >>>>> On Thu, Jun 12, 2014 at 10:57 PM, Shameera Rathnayaka
> <
> > >> > > > > > >>>>> [email protected]> wrote:
> > >> > > > > > >>>>>
> > >> > > > > > >>>>>  Hi Lahiru,
> > >> > > > > > >>>>>>
> > >> > > > > > >>>>>> As i understood,  not only reliability , you are
> trying
> > >> to
> > >> > > > achieve
> > >> > > > > > >>>>>> some
> > >> > > > > > >>>>>> other requirement by introducing zookeeper, like
> health
> > >> > > > monitoring
> > >> > > > > > of
> > >> > > > > > >>>>>>
> > >> > > > > > >>>>> the
> > >> > > > > > >>>>
> > >> > > > > > >>>>> services, categorization with service implementation
> etc
> > >> ...
> > >> > .
> > >> > > In
> > >> > > > > > that
> > >> > > > > > >>>>>> case, i think we can get use of zookeeper's features
> > but
> > >> if
> > >> > we
> > >> > > > > only
> > >> > > > > > >>>>>>
> > >> > > > > > >>>>> focus
> > >> > > > > > >>>>
> > >> > > > > > >>>>> on reliability, i have little bit of concern, why
> can't
> > we
> > >> > use
> > >> > > > > > >>>>>>
> > >> > > > > > >>>>> clustering +
> > >> > > > > > >>>>
> > >> > > > > > >>>>> LB ?
> > >> > > > > > >>>>>>
> > >> > > > > > >>>>>> Yes it is better we add Zookeeper as a prerequisite
> if
> > >> user
> > >> > > need
> > >> > > > > to
> > >> > > > > > >>>>>> use
> > >> > > > > > >>>>>> it.
> > >> > > > > > >>>>>>
> > >> > > > > > >>>>>> Thanks,
> > >> > > > > > >>>>>>   Shameera.
> > >> > > > > > >>>>>>
> > >> > > > > > >>>>>>
> > >> > > > > > >>>>>> On Thu, Jun 12, 2014 at 5:19 AM, Lahiru Gunathilake <
> > >> > > > > > >>>>>> [email protected]
> > >> > > > > > >>>>>> wrote:
> > >> > > > > > >>>>>>
> > >> > > > > > >>>>>>  Hi Gagan,
> > >> > > > > > >>>>>>>
> > >> > > > > > >>>>>>> I need to start another discussion about it, but I
> had
> > >> an
> > >> > > > offline
> > >> > > > > > >>>>>>> discussion with Suresh about auto-scaling. I will
> > start
> > >> > > another
> > >> > > > > > >>>>>>> thread
> > >> > > > > > >>>>>>> about this topic too.
> > >> > > > > > >>>>>>>
> > >> > > > > > >>>>>>> Regards
> > >> > > > > > >>>>>>> Lahiru
> > >> > > > > > >>>>>>>
> > >> > > > > > >>>>>>>
> > >> > > > > > >>>>>>> On Wed, Jun 11, 2014 at 4:10 PM, Gagan Juneja <
> > >> > > > > > >>>>>>>
> > >> > > > > > >>>>>> [email protected]
> > >> > > > > > >>>>
> > >> > > > > > >>>>> wrote:
> > >> > > > > > >>>>>>>
> > >> > > > > > >>>>>>>  Thanks Lahiru for pointing to nice library, added
> to
> > my
> > >> > > > > dictionary
> > >> > > > > > >>>>>>>>
> > >> > > > > > >>>>>>> :).
> > >> > > > > > >>>>
> > >> > > > > > >>>>>  I would like to know how are we planning to start
> > >> multiple
> > >> > > > > servers.
> > >> > > > > > >>>>>>>> 1. Spawning new servers based on load? Some times
> we
> > >> call
> > >> > it
> > >> > > > as
> > >> > > > > > auto
> > >> > > > > > >>>>>>>> scalable.
> > >> > > > > > >>>>>>>> 2. To make some specific number of nodes available
> > >> such as
> > >> > > we
> > >> > > > > > want 2
> > >> > > > > > >>>>>>>> servers to be available at any time so if one goes
> > down
> > >> > > then I
> > >> > > > > > need
> > >> > > > > > >>>>>>>>
> > >> > > > > > >>>>>>> to
> > >> > > > > > >>>>
> > >> > > > > > >>>>>  spawn one new to make available servers count 2.
> > >> > > > > > >>>>>>>> 3. Initially start all the servers.
> > >> > > > > > >>>>>>>>
> > >> > > > > > >>>>>>>> In scenario 1 and 2 zookeeper does make sense but I
> > >> don't
> > >> > > > > believe
> > >> > > > > > >>>>>>>>
> > >> > > > > > >>>>>>> existing
> > >> > > > > > >>>>>>>
> > >> > > > > > >>>>>>>> architecture support this?
> > >> > > > > > >>>>>>>>
> > >> > > > > > >>>>>>>> Regards,
> > >> > > > > > >>>>>>>> Gagan
> > >> > > > > > >>>>>>>> On 12-Jun-2014 1:19 am, "Lahiru Gunathilake" <
> > >> > > > [email protected]
> > >> > > > > >
> > >> > > > > > >>>>>>>>
> > >> > > > > > >>>>>>> wrote:
> > >> > > > > > >>>>>>>
> > >> > > > > > >>>>>>>> Hi Gagan,
> > >> > > > > > >>>>>>>>>
> > >> > > > > > >>>>>>>>> Thanks for your response. Please see my inline
> > >> comments.
> > >> > > > > > >>>>>>>>>
> > >> > > > > > >>>>>>>>>
> > >> > > > > > >>>>>>>>> On Wed, Jun 11, 2014 at 3:37 PM, Gagan Juneja <
> > >> > > > > > >>>>>>>>>
> > >> > > > > > >>>>>>>> [email protected]>
> > >> > > > > > >>>>>>>
> > >> > > > > > >>>>>>>> wrote:
> > >> > > > > > >>>>>>>>>
> > >> > > > > > >>>>>>>>>  Hi Lahiru,
> > >> > > > > > >>>>>>>>>> Just my 2 cents.
> > >> > > > > > >>>>>>>>>>
> > >> > > > > > >>>>>>>>>> I am big fan of zookeeper but also against adding
> > >> > multiple
> > >> > > > > hops
> > >> > > > > > in
> > >> > > > > > >>>>>>>>>>
> > >> > > > > > >>>>>>>>> the
> > >> > > > > > >>>>>>>
> > >> > > > > > >>>>>>>> system which can add unnecessary complexity. Here I
> > am
> > >> not
> > >> > > > able
> > >> > > > > to
> > >> > > > > > >>>>>>>>>> understand the requirement of zookeeper may be I
> am
> > >> > wrong
> > >> > > > > > because
> > >> > > > > > >>>>>>>>>>
> > >> > > > > > >>>>>>>>> of
> > >> > > > > > >>>>
> > >> > > > > > >>>>> less
> > >> > > > > > >>>>>>>
> > >> > > > > > >>>>>>>> knowledge of the airavata system in whole. So I
> would
> > >> like
> > >> > > to
> > >> > > > > > >>>>>>>>>>
> > >> > > > > > >>>>>>>>> discuss
> > >> > > > > > >>>>
> > >> > > > > > >>>>>  following point.
> > >> > > > > > >>>>>>>>>>
> > >> > > > > > >>>>>>>>>> 1. How it will help us in making system more
> > >> reliable.
> > >> > > > > Zookeeper
> > >> > > > > > >>>>>>>>>>
> > >> > > > > > >>>>>>>>> is
> > >> > > > > > >>>>
> > >> > > > > > >>>>> not
> > >> > > > > > >>>>>>>
> > >> > > > > > >>>>>>>> able to restart services. At max it can tell
> whether
> > >> > service
> > >> > > > is
> > >> > > > > up
> > >> > > > > > >>>>>>>>>>
> > >> > > > > > >>>>>>>>> or not
> > >> > > > > > >>>>>>>
> > >> > > > > > >>>>>>>> which could only be the case if airavata service
> goes
> > >> down
> > >> > > > > > >>>>>>>>>>
> > >> > > > > > >>>>>>>>> gracefully and
> > >> > > > > > >>>>>>>
> > >> > > > > > >>>>>>>> we have any automated way to restart it. If this is
> > >> just
> > >> > > > matter
> > >> > > > > of
> > >> > > > > > >>>>>>>>>>
> > >> > > > > > >>>>>>>>> routing
> > >> > > > > > >>>>>>>
> > >> > > > > > >>>>>>>> client requests to the available thrift servers
> then
> > >> this
> > >> > > can
> > >> > > > be
> > >> > > > > > >>>>>>>>>>
> > >> > > > > > >>>>>>>>> achieved
> > >> > > > > > >>>>>>>
> > >> > > > > > >>>>>>>> with the help of load balancer which I guess is
> > already
> > >> > > there
> > >> > > > in
> > >> > > > > > >>>>>>>>>>
> > >> > > > > > >>>>>>>>> thrift
> > >> > > > > > >>>>>>>
> > >> > > > > > >>>>>>>> wish list.
> > >> > > > > > >>>>>>>>>>
> > >> > > > > > >>>>>>>>>>  We have multiple thrift services and currently
> we
> > >> start
> > >> > > > only
> > >> > > > > > one
> > >> > > > > > >>>>>>>>>
> > >> > > > > > >>>>>>>> instance
> > >> > > > > > >>>>>>>
> > >> > > > > > >>>>>>>> of them and each thrift service is a stateless
> > >> service. To
> > >> > > > keep
> > >> > > > > > the
> > >> > > > > > >>>>>>>>>
> > >> > > > > > >>>>>>>> high
> > >> > > > > > >>>>>>>
> > >> > > > > > >>>>>>>> availability we have to start multiple instances of
> > >> them
> > >> > in
> > >> > > > > > >>>>>>>>>
> > >> > > > > > >>>>>>>> production
> > >> > > > > > >>>>
> > >> > > > > > >>>>>  scenario. So for clients to get an available thrift
> > >> service
> > >> > we
> > >> > > > can
> > >> > > > > > >>>>>>>>>
> > >> > > > > > >>>>>>>> use
> > >> > > > > > >>>>
> > >> > > > > > >>>>>  zookeeper znodes to represent each available service.
> > >> There
> > >> > > are
> > >> > > > > > >>>>>>>>>
> > >> > > > > > >>>>>>>> some
> > >> > > > > > >>>>
> > >> > > > > > >>>>>  libraries which is doing similar[1] and I think we
> can
> > >> use
> > >> > > them
> > >> > > > > > >>>>>>>>>
> > >> > > > > > >>>>>>>> directly.
> > >> > > > > > >>>>>>>
> > >> > > > > > >>>>>>>> 2. As far as registering of different providers is
> > >> > concerned
> > >> > > > do
> > >> > > > > > >>>>>>>>>>
> > >> > > > > > >>>>>>>>> you
> > >> > > > > > >>>>
> > >> > > > > > >>>>>  think for that we really need external store.
> > >> > > > > > >>>>>>>>>>
> > >> > > > > > >>>>>>>>>>  Yes I think so, because its light weight and
> > >> reliable
> > >> > and
> > >> > > > we
> > >> > > > > > have
> > >> > > > > > >>>>>>>>>
> > >> > > > > > >>>>>>>> to
> > >> > > > > > >>>>
> > >> > > > > > >>>>> do
> > >> > > > > > >>>>>>>
> > >> > > > > > >>>>>>>> very minimal amount of work to achieve all these
> > >> features
> > >> > to
> > >> > > > > > >>>>>>>>>
> > >> > > > > > >>>>>>>> Airavata
> > >> > > > > > >>>>
> > >> > > > > > >>>>>  because zookeeper handle all the complexity.
> > >> > > > > > >>>>>>>>>
> > >> > > > > > >>>>>>>>>  I have seen people using zookeeper more for state
> > >> > > management
> > >> > > > > in
> > >> > > > > > >>>>>>>>>> distributed environments.
> > >> > > > > > >>>>>>>>>>
> > >> > > > > > >>>>>>>>>>  +1, we might not be the most effective users of
> > >> > zookeeper
> > >> > > > > > because
> > >> > > > > > >>>>>>>>>
> > >> > > > > > >>>>>>>> all
> > >> > > > > > >>>>
> > >> > > > > > >>>>> of
> > >> > > > > > >>>>>>>
> > >> > > > > > >>>>>>>> our services are stateless services, but my point
> is
> > to
> > >> > > > achieve
> > >> > > > > > >>>>>>>>> fault-tolerance we can use zookeeper and with
> > minimal
> > >> > work.
> > >> > > > > > >>>>>>>>>
> > >> > > > > > >>>>>>>>>    I would like to understand more how can we
> > leverage
> > >> > > > > zookeeper
> > >> > > > > > in
> > >> > > > > > >>>>>>>>>> airavata to make system reliable.
> > >> > > > > > >>>>>>>>>>
> > >> > > > > > >>>>>>>>>>
> > >> > > > > > >>>>>>>>>>  [1]https://github.com/eirslett/thrift-zookeeper
> > >> > > > > > >>>>>>>>>
> > >> > > > > > >>>>>>>>>
> > >> > > > > > >>>>>>>>>
> > >> > > > > > >>>>>>>>>  Regards,
> > >> > > > > > >>>>>>>>>> Gagan
> > >> > > > > > >>>>>>>>>> On 12-Jun-2014 12:33 am, "Marlon Pierce" <
> > >> > [email protected]
> > >> > > >
> > >> > > > > > wrote:
> > >> > > > > > >>>>>>>>>>
> > >> > > > > > >>>>>>>>>>  Thanks for the summary, Lahiru. I'm cc'ing the
> > >> > > Architecture
> > >> > > > > > list
> > >> > > > > > >>>>>>>>>>>
> > >> > > > > > >>>>>>>>>> for
> > >> > > > > > >>>>
> > >> > > > > > >>>>>  additional comments.
> > >> > > > > > >>>>>>>>>>>
> > >> > > > > > >>>>>>>>>>> Marlon
> > >> > > > > > >>>>>>>>>>>
> > >> > > > > > >>>>>>>>>>> On 6/11/14 2:27 PM, Lahiru Gunathilake wrote:
> > >> > > > > > >>>>>>>>>>>
> > >> > > > > > >>>>>>>>>>>> Hi All,
> > >> > > > > > >>>>>>>>>>>>
> > >> > > > > > >>>>>>>>>>>> I did little research about Apache Zookeeper[1]
> > and
> > >> > how
> > >> > > to
> > >> > > > > use
> > >> > > > > > >>>>>>>>>>>>
> > >> > > > > > >>>>>>>>>>> it
> > >> > > > > > >>>>
> > >> > > > > > >>>>> in
> > >> > > > > > >>>>>>>
> > >> > > > > > >>>>>>>>  airavata. Its really a nice way to achieve fault
> > >> > tolerance
> > >> > > > and
> > >> > > > > > >>>>>>>>>>>>
> > >> > > > > > >>>>>>>>>>> reliable
> > >> > > > > > >>>>>>>>>>>
> > >> > > > > > >>>>>>>>>>>> communication between our thrift services and
> > >> clients.
> > >> > > > > > >>>>>>>>>>>>
> > >> > > > > > >>>>>>>>>>> Zookeeper
> > >> > > > > > >>>>
> > >> > > > > > >>>>> is a
> > >> > > > > > >>>>>>>
> > >> > > > > > >>>>>>>>  distributed, fault tolerant system to do a
> reliable
> > >> > > > > > >>>>>>>>>>>>
> > >> > > > > > >>>>>>>>>>> communication
> > >> > > > > > >>>>
> > >> > > > > > >>>>>  between
> > >> > > > > > >>>>>>>>>>>
> > >> > > > > > >>>>>>>>>>>> distributed applications. This is like an
> > in-memory
> > >> > file
> > >> > > > > > system
> > >> > > > > > >>>>>>>>>>>>
> > >> > > > > > >>>>>>>>>>> which
> > >> > > > > > >>>>>>>
> > >> > > > > > >>>>>>>>  has
> > >> > > > > > >>>>>>>>>>>
> > >> > > > > > >>>>>>>>>>>> nodes in a tree structure and each node can
> have
> > >> small
> > >> > > > > amount
> > >> > > > > > >>>>>>>>>>>>
> > >> > > > > > >>>>>>>>>>> of
> > >> > > > > > >>>>
> > >> > > > > > >>>>> data
> > >> > > > > > >>>>>>>
> > >> > > > > > >>>>>>>>  associated with it and these nodes are called
> > znodes.
> > >> > > Clients
> > >> > > > > > >>>>>>>>>>>>
> > >> > > > > > >>>>>>>>>>> can
> > >> > > > > > >>>>
> > >> > > > > > >>>>>  connect
> > >> > > > > > >>>>>>>>>>>
> > >> > > > > > >>>>>>>>>>>> to a zookeeper server and add/delete and update
> > >> these
> > >> > > > > znodes.
> > >> > > > > > >>>>>>>>>>>>
> > >> > > > > > >>>>>>>>>>>>    In Apache Airavata we start multiple thrift
> > >> > services
> > >> > > > and
> > >> > > > > > >>>>>>>>>>>>
> > >> > > > > > >>>>>>>>>>> these
> > >> > > > > > >>>>
> > >> > > > > > >>>>> can
> > >> > > > > > >>>>>>>
> > >> > > > > > >>>>>>>>  go
> > >> > > > > > >>>>>>>>>>>
> > >> > > > > > >>>>>>>>>>>> down for maintenance or these can crash, if we
> > use
> > >> > > > zookeeper
> > >> > > > > > to
> > >> > > > > > >>>>>>>>>>>>
> > >> > > > > > >>>>>>>>>>> store
> > >> > > > > > >>>>>>>
> > >> > > > > > >>>>>>>>  these
> > >> > > > > > >>>>>>>>>>>
> > >> > > > > > >>>>>>>>>>>> configuration(thrift service configurations) we
> > can
> > >> > > > achieve
> > >> > > > > a
> > >> > > > > > >>>>>>>>>>>>
> > >> > > > > > >>>>>>>>>>> very
> > >> > > > > > >>>>
> > >> > > > > > >>>>>  reliable
> > >> > > > > > >>>>>>>>>>>
> > >> > > > > > >>>>>>>>>>>> system. Basically thrift clients can
> dynamically
> > >> > > discover
> > >> > > > > > >>>>>>>>>>>>
> > >> > > > > > >>>>>>>>>>> available
> > >> > > > > > >>>>>>>
> > >> > > > > > >>>>>>>>  service
> > >> > > > > > >>>>>>>>>>>
> > >> > > > > > >>>>>>>>>>>> by using ephemeral znodes(Here we do not have
> to
> > >> > change
> > >> > > > the
> > >> > > > > > >>>>>>>>>>>>
> > >> > > > > > >>>>>>>>>>> generated
> > >> > > > > > >>>>>>>
> > >> > > > > > >>>>>>>>  thrift client code but we have to change the
> > >> locations we
> > >> > > are
> > >> > > > > > >>>>>>>>>>>>
> > >> > > > > > >>>>>>>>>>> invoking
> > >> > > > > > >>>>>>>
> > >> > > > > > >>>>>>>>  them). ephemeral znodes will be removed when the
> > >> thrift
> > >> > > > service
> > >> > > > > > >>>>>>>>>>>>
> > >> > > > > > >>>>>>>>>>> goes
> > >> > > > > > >>>>>>>
> > >> > > > > > >>>>>>>>  down
> > >> > > > > > >>>>>>>>>>>
> > >> > > > > > >>>>>>>>>>>> and zookeeper guarantee the atomicity between
> > these
> > >> > > > > > operations.
> > >> > > > > > >>>>>>>>>>>>
> > >> > > > > > >>>>>>>>>>> With
> > >> > > > > > >>>>>>>
> > >> > > > > > >>>>>>>>  this
> > >> > > > > > >>>>>>>>>>>
> > >> > > > > > >>>>>>>>>>>> approach we can have a node hierarchy for
> > multiple
> > >> of
> > >> > > > > > airavata,
> > >> > > > > > >>>>>>>>>>>> orchestrator,appcatalog and gfac thrift
> services.
> > >> > > > > > >>>>>>>>>>>>
> > >> > > > > > >>>>>>>>>>>> For specifically for gfac we can have different
> > >> types
> > >> > of
> > >> > > > > > >>>>>>>>>>>>
> > >> > > > > > >>>>>>>>>>> services
> > >> > > > > > >>>>
> > >> > > > > > >>>>> for
> > >> > > > > > >>>>>>>
> > >> > > > > > >>>>>>>>  each
> > >> > > > > > >>>>>>>>>>>
> > >> > > > > > >>>>>>>>>>>> provider implementation. This can be achieved
> by
> > >> using
> > >> > > the
> > >> > > > > > >>>>>>>>>>>>
> > >> > > > > > >>>>>>>>>>> hierarchical
> > >> > > > > > >>>>>>>>>>>
> > >> > > > > > >>>>>>>>>>>> support in zookeeper and providing some logic
> in
> > >> > > > gfac-thrift
> > >> > > > > > >>>>>>>>>>>>
> > >> > > > > > >>>>>>>>>>> service
> > >> > > > > > >>>>>>>
> > >> > > > > > >>>>>>>>  to
> > >> > > > > > >>>>>>>>>>>
> > >> > > > > > >>>>>>>>>>>> register it to a defined path. Using the same
> > logic
> > >> > > > > > >>>>>>>>>>>>
> > >> > > > > > >>>>>>>>>>> orchestrator
> > >> > > > > > >>>>
> > >> > > > > > >>>>> can
> > >> > > > > > >>>>>>>
> > >> > > > > > >>>>>>>>  discover the provider specific gfac thrift service
> > and
> > >> > > route
> > >> > > > > > >>>>>>>>>>>>
> > >> > > > > > >>>>>>>>>>> the
> > >> > > > > > >>>>
> > >> > > > > > >>>>>  message to
> > >> > > > > > >>>>>>>>>>>
> > >> > > > > > >>>>>>>>>>>> the correct thrift service.
> > >> > > > > > >>>>>>>>>>>>
> > >> > > > > > >>>>>>>>>>>> With this approach I think we simply have write
> > >> some
> > >> > > > client
> > >> > > > > > >>>>>>>>>>>>
> > >> > > > > > >>>>>>>>>>> code
> > >> > > > > > >>>>
> > >> > > > > > >>>>> in
> > >> > > > > > >>>>>>>
> > >> > > > > > >>>>>>>>  thrift
> > >> > > > > > >>>>>>>>>>>
> > >> > > > > > >>>>>>>>>>>> services and clients and zookeeper server
> > >> installation
> > >> > > can
> > >> > > > > be
> > >> > > > > > >>>>>>>>>>>>
> > >> > > > > > >>>>>>>>>>> done as
> > >> > > > > > >>>>>>>
> > >> > > > > > >>>>>>>>  a
> > >> > > > > > >>>>>>>>>>>
> > >> > > > > > >>>>>>>>>>>> separate process and it will be easier to keep
> > the
> > >> > > > Zookeeper
> > >> > > > > > >>>>>>>>>>>>
> > >> > > > > > >>>>>>>>>>> server
> > >> > > > > > >>>>>>>
> > >> > > > > > >>>>>>>>  separate from Airavata because installation of
> > >> Zookeeper
> > >> > > > server
> > >> > > > > > >>>>>>>>>>>>
> > >> > > > > > >>>>>>>>>>> little
> > >> > > > > > >>>>>>>
> > >> > > > > > >>>>>>>>  complex in production scenario. I think we have to
> > >> make
> > >> > > sure
> > >> > > > > > >>>>>>>>>>>>
> > >> > > > > > >>>>>>>>>>> everything
> > >> > > > > > >>>>>>>>>>>
> > >> > > > > > >>>>>>>>>>>> works fine when there is no Zookeeper running,
> > ex:
> > >> > > > > > >>>>>>>>>>>>
> > >> > > > > > >>>>>>>>>>> enable.zookeeper=false
> > >> > > > > > >>>>>>>>>>>
> > >> > > > > > >>>>>>>>>>>> should works fine and users doesn't have to
> > >> download
> > >> > and
> > >> > > > > start
> > >> > > > > > >>>>>>>>>>>>
> > >> > > > > > >>>>>>>>>>> zookeeper.
> > >> > > > > > >>>>>>>>>>>
> > >> > > > > > >>>>>>>>>>>>
> > >> > > > > > >>>>>>>>>>>>
> > >> > > > > > >>>>>>>>>>>> [1]http://zookeeper.apache.org/
> > >> > > > > > >>>>>>>>>>>>
> > >> > > > > > >>>>>>>>>>>> Thanks
> > >> > > > > > >>>>>>>>>>>> Lahiru
> > >> > > > > > >>>>>>>>>>>>
> > >> > > > > > >>>>>>>>>>>
> > >> > > > > > >>>>>>>>>>>
> > >> > > > > > >>>>>>>>> --
> > >> > > > > > >>>>>>>>> System Analyst Programmer
> > >> > > > > > >>>>>>>>> PTI Lab
> > >> > > > > > >>>>>>>>> Indiana University
> > >> > > > > > >>>>>>>>>
> > >> > > > > > >>>>>>>>>
> > >> > > > > > >>>>>>> --
> > >> > > > > > >>>>>>> System Analyst Programmer
> > >> > > > > > >>>>>>> PTI Lab
> > >> > > > > > >>>>>>> Indiana University
> > >> > > > > > >>>>>>>
> > >> > > > > > >>>>>>>
> > >> > > > > > >>>>>>
> > >> > > > > > >>>>>> --
> > >> > > > > > >>>>>> Best Regards,
> > >> > > > > > >>>>>> Shameera Rathnayaka.
> > >> > > > > > >>>>>>
> > >> > > > > > >>>>>> email: shameera AT apache.org , shameerainfo AT
> > >> gmail.com
> > >> > > > > > >>>>>> Blog : http://shameerarathnayaka.blogspot.com/
> > >> > > > > > >>>>>>
> > >> > > > > > >>>>>>
> > >> > > > > > >>>>>
> > >> > > > > > >>>>> --
> > >> > > > > > >>>>> Supun Kamburugamuva
> > >> > > > > > >>>>> Member, Apache Software Foundation;
> > http://www.apache.org
> > >> > > > > > >>>>> E-mail: [email protected];  Mobile: +1 812 369 6762
> > >> > > > > > >>>>> Blog: http://supunk.blogspot.com
> > >> > > > > > >>>>>
> > >> > > > > > >>>>>
> > >> > > > > > >>>>>
> > >> > > > > > >>>> --
> > >> > > > > > >>>> System Analyst Programmer
> > >> > > > > > >>>> PTI Lab
> > >> > > > > > >>>> Indiana University
> > >> > > > > > >>>>
> > >> > > > > > >>>>
> > >> > > > > > >>>
> > >> > > > > > >>> --
> > >> > > > > > >>> Supun Kamburugamuva
> > >> > > > > > >>> Member, Apache Software Foundation;
> http://www.apache.org
> > >> > > > > > >>> E-mail: [email protected];  Mobile: +1 812 369 6762
> > >> > > > > > >>> Blog: http://supunk.blogspot.com
> > >> > > > > > >>>
> > >> > > > > > >>>
> > >> > > > > > >>>
> > >> > > > > > >>
> > >> > > > > > >
> > >> > > > > >
> > >> > > > > >
> > >> > > > > > --
> > >> > > > > > Supun Kamburugamuva
> > >> > > > > > Member, Apache Software Foundation; http://www.apache.org
> > >> > > > > > E-mail: [email protected];  Mobile: +1 812 369 6762
> > >> > > > > > Blog: http://supunk.blogspot.com
> > >> > > > > >
> > >> > > > >
> > >> > > > >
> > >> > > > >
> > >> > > > > --
> > >> > > > > System Analyst Programmer
> > >> > > > > PTI Lab
> > >> > > > > Indiana University
> > >> > > > >
> > >> > > >
> > >> > > >
> > >> > > >
> > >> > > > --
> > >> > > > Supun Kamburugamuva
> > >> > > > Member, Apache Software Foundation; http://www.apache.org
> > >> > > > E-mail: [email protected];  Mobile: +1 812 369 6762
> > >> > > > Blog: http://supunk.blogspot.com
> > >> > > >
> > >> > >
> > >> > >
> > >> > >
> > >> > > --
> > >> > > System Analyst Programmer
> > >> > > PTI Lab
> > >> > > Indiana University
> > >> > >
> > >> >
> > >>
> > >>
> > >>
> > >> --
> > >> Supun Kamburugamuva
> > >> Member, Apache Software Foundation; http://www.apache.org
> > >> E-mail: [email protected];  Mobile: +1 812 369 6762
> > >> Blog: http://supunk.blogspot.com
> > >>
> > >
> > >
> > >
> > > --
> > > System Analyst Programmer
> > > PTI Lab
> > > Indiana University
> > >
> >
> >
> >
> > --
> > System Analyst Programmer
> > PTI Lab
> > Indiana University
> >
>



-- 
System Analyst Programmer
PTI Lab
Indiana University

Re: Zookeeper in Airavata to achieve reliability

Reply via email to