Re: Zookeeper in Airavata to achieve reliability

Suresh Marru Wed, 25 Jun 2014 13:29:30 -0700

Lahiru,

Awesome, glad to see the comprehensive thought in these use cases. Once we nail 
these kind of scenarios with an approach like ZK, then it will be a good 
starting point to explore other options.


I really like the idea of framework level and provider level checkpointing and 
also having Recoverable handlers and provider implementations.  

Suresh

 
On Jun 25, 2014, at 2:09 PM, Lahiru Gunathilake <[email protected]> wrote:

> Hi All,
> 
> I have finished the initial version of the ZK integration. Now we can start
> multiple thrift gfac services (still the communication between orchestrator
> and gfac is RPC) and orchestrator submit jobs to multiple gfac nodes.
> 
> I can kill a gfac node and orchestrator will make sure jobs are not lost,
> it simply take those jobs and re-submit to gfac. Since GFac is a generic
> framework and we have multiple plugins developed for that framework
> checkpointing the plugin is up to the plugin developers but gfac
> checkpoints whether those plugins invoked or not.
> 
> I have introduced a new interface for plugin development called Recoverable
> (RecoverableHandlers and RecoverableProvider). So state-full plugins has to
> implement their recover method and gfac framework will make sure it will be
> invoked during a re-run scenario. If a plugin is not recoverable and
> already ran(can be found using framework checkpointing) during the re-run
> that plugin will not be invoked. For now I just implemented recoverability
> to few plugins and I have tested submitting a job to trestles and let it
> submit and come to monitoring state and kill that gfac instance. Now
> Orchestrator pick that execution and re-submit to another gfac node and
> that gfac node does not re-run that job to the computing resource, but
> simply start monitoring once the job is done outputs are downloaded from
> the original output location.
> 
> When a particular experiment is finished all the ZK data is removed.
> 
> At this point following things needs to be done,
> 
> 1. Figure out all the state-full handlers/providers and implement
> recoverability,
> 
> Ex: Input handler is transfering 1000 files and with 500 files gfac
> instance crashed, during the re-run it should be able to tranfer from 501
> file. Same logic can be applied to a single huge file. Those things are
> completely up to the plugin developer.
> 
> 2. Then we have to do remove the RPC invocation and make gfac nodes as
> worker nodes.
> 
> Regards
> Lahiru
> 
> 
> On Wed, Jun 18, 2014 at 12:11 PM, Lahiru Gunathilake <[email protected]>
> wrote:
> 
>> Hi Eran,
>> 
>> 
>> On Tue, Jun 17, 2014 at 4:06 PM, Eran Chinthaka Withana <
>> [email protected]> wrote:
>> 
>>> Storm has a Kafka spout which manages the cursor location (pointer to the
>>> head of the queue representing the next message to be processed) inside
>>> ZK.
>>> Each storm spout instance uses this information to get the next item to
>>> process. Storm kafka spout won't advance to the next message until it gets
>>> an ack from the storm topology.
>>> 
>> If we have 10 jobs in the queue and 5 GFAC instances picked 1 at a time
>> and successfully submitted and have to start taking rest of the jobs. But
>> all 5 GFAC instances are responsible for initially picked  5 jobs because
>> they are still running and gfac instances are monitoring them until its
>> done but at the same time we have to move the cursor to pick other jobs
>> too.
>> 
>> If we Ack and moved the cursor just after submission without waiting until
>> the job is actually finished how are we going to know which gfac is
>> monitoring which set of jobs ?
>> 
>> I am not getting how achieve above requirement with this suggestion. May
>> be I am missing something here.
>> 
>> Regards
>> Lahiru
>> 
>>> 
>>> So, if there is an exception in the topology and ack is sent only by the
>>> last bolt, then storm bolt make sure all messages are processed since
>>> exceptions won't generate acks.
>>> 
>>> Thanks,
>>> Eran Chinthaka Withana
>>> 
>>> 
>>> On Tue, Jun 17, 2014 at 12:30 PM, Lahiru Gunathilake <[email protected]>
>>> wrote:
>>> 
>>>> Hi Eran,
>>>> 
>>>> I think I should take back my last email. When I carefully look at
>>> storm I
>>>> have following question.
>>>> 
>>>> How are we going to store the Job statuses  and relaunch the jobs which
>>> was
>>>> running in failure nodes ? Its true that storm is starting new workers
>>> but
>>>> there should be a way to find missing jobs by someone in the system.
>>> Since
>>>> we are not having a data stream there is no use to start new workers
>>> unless
>>>> we handler the missing jobs. I think we need to have a better control of
>>>> our component and persist the states of jobs each GFAC node is handling.
>>>> Directly using zookeeper will let us to do a proper fault tolerance
>>>> implementation.
>>>> 
>>>> Regards
>>>> Lahiru
>>>> 
>>>> 
>>>> 
>>>> On Tue, Jun 17, 2014 at 3:14 PM, Lahiru Gunathilake <[email protected]>
>>>> wrote:
>>>> 
>>>>> Hi Supun,
>>>>> 
>>>>> I think in this usecase we only use storm topology to do the
>>>> communication
>>>>> among workers and we are completely ignoring the stream processing
>>> part.
>>>>> Orchestrator will talk to Nimbus and GFAC nodes will be Worker nodes
>>> in
>>>> the
>>>>> storm topology. But I think we can achieve extremely fault tolerance
>>>> system
>>>>> by directly using storm based on following statement in storm site
>>> with
>>>>> minimum changes in airavata.
>>>>> 
>>>>> Additionally, the Nimbus daemon and Supervisor daemons are fail-fast
>>> and
>>>>> stateless; all state is kept in Zookeeper or on local disk. This means
>>>> you
>>>>> can kill -9 Nimbus or the Supervisors and they’ll start back up like
>>>>> nothing happened. This design leads to Storm clusters being incredibly
>>>>> stable.
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> On Tue, Jun 17, 2014 at 3:02 PM, Supun Kamburugamuva <
>>> [email protected]>
>>>>> wrote:
>>>>> 
>>>>>> Hi Eran,
>>>>>> 
>>>>>> I'm using Storm every day and this is one of the strangest things
>>> I've
>>>>>> heard about using Storm. My be there are more use cases for Storm
>>> other
>>>>>> than Distributed Stream processing. AFAIK the Bolts, spouts are
>>> built to
>>>>>> handle a stream of events that doesn't take much time to process. In
>>>>>> Airavata we don't process the messages. Instead we run experiments
>>> based
>>>>>> on
>>>>>> the commands given.
>>>>>> 
>>>>>> If you want process isolation, distributed execution, cluster
>>> resource
>>>>>> management Yarn would be a better thing to explore.
>>>>>> 
>>>>>> Thanks,
>>>>>> Supun..
>>>>>> 
>>>>>> 
>>>>>> On Tue, Jun 17, 2014 at 2:27 PM, Eran Chinthaka Withana <
>>>>>> [email protected]> wrote:
>>>>>> 
>>>>>>> Hi Lahiru,
>>>>>>> 
>>>>>>> good summarization. Thanks Lahiru.
>>>>>>> 
>>>>>>> I think you are trying to stick to a model where Orchestrator
>>>>>> distributing
>>>>>>> to work for GFac worker and trying to do the impedance mismatch
>>>> through
>>>>>> a
>>>>>>> messaging solution. If you step back and think, we don't even want
>>> the
>>>>>>> orchestrator to handle everything. From its point of view, it
>>> should
>>>>>> submit
>>>>>>> jobs to the framework, and will wait or get notified once the job
>>> is
>>>>>> done.
>>>>>>> 
>>>>>>> There are multiple ways of doing this. And here is one method.
>>>>>>> 
>>>>>>> Orchestrator submits all its jobs to Job queue (implemented using
>>> any
>>>> MQ
>>>>>>> impl like Rabbit or Kafka). A storm topology is implemented to
>>> dequeue
>>>>>>> messages, process them (i.e. submit those jobs and get those
>>> executed)
>>>>>> and
>>>>>>> notify the Orchestrator with the status (either through another
>>>>>>> JobCompletionQueue or direct invocation).
>>>>>>> 
>>>>>>> With this approach, the MQ provider will help to match impedance
>>>> between
>>>>>>> job submission and consumption. Storm helps with worker
>>> coordination,
>>>>>> load
>>>>>>> balancing, throttling on your job execution framework, worker pool
>>>>>>> management and fault tolerance.
>>>>>>> 
>>>>>>> Of course, you can implement this based only on ZK and handle
>>>> everything
>>>>>>> else on your own but storm had done exactly that with the use of ZK
>>>>>>> underneath.
>>>>>>> 
>>>>>>> Finally, if you go for a model like this, then even beyond job
>>>>>> submission,
>>>>>>> you can use the same model to do anything within the framework for
>>>>>> internal
>>>>>>> communication. For example, the workflow engine will submit its
>>> jobs
>>>> to
>>>>>>> queues based on what it has to do. Storm topologies exists for each
>>>>>> queues
>>>>>>> to dequeue messages and carry out the work in a reliable manner.
>>>>>> Consider
>>>>>>> this as mini-workflows within a larger workflow framework.
>>>>>>> 
>>>>>>> We can have a voice chat if its more convenient. But not at 7am
>>> PST :)
>>>>>>> 
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Eran Chinthaka Withana
>>>>>>> 
>>>>>>> 
>>>>>>> On Tue, Jun 17, 2014 at 10:12 AM, Lahiru Gunathilake <
>>>> [email protected]
>>>>>>> 
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> Hi All,
>>>>>>>> 
>>>>>>>> Ignoring the tool that we are going to use to implement fault
>>>>>> tolerance I
>>>>>>>> have summarized the model we have decided so far. I will use the
>>>> tool
>>>>>>> name
>>>>>>>> as X, we can use Zookeeper or some other implementation.
>>> Following
>>>>>> design
>>>>>>>> assume tool X  and Registry have high availability.
>>>>>>>> 
>>>>>>>> 1. Orchestrator and GFAC worker node communication is going to be
>>>>>> queue
>>>>>>>> based and tool X is going to be used for this communication. (We
>>>> have
>>>>>> to
>>>>>>>> implement this with considering race condition between different
>>>> gfac
>>>>>>>> workers).
>>>>>>>> 2. We are having multiple instances of GFAC which are identical
>>> (In
>>>>>>> future
>>>>>>>> we can group gfac workers). Existence of each worker node is
>>>>>> identified
>>>>>>>> using X. If node goes down orchestrator will be notified by X.
>>>>>>>> 3. When a particular request comes and accepted by one gfac
>>> worker
>>>>>> that
>>>>>>>> information will be replicated in tool X and a place where this
>>>>>>> information
>>>>>>>> is persisted even the worker failed.
>>>>>>>> 4. When a job comes to a final state like failed or cancelled or
>>>>>>> completed
>>>>>>>> above information will be removed. So at a given time
>>> orchestrator
>>>> can
>>>>>>> poll
>>>>>>>> active jobs in each worker by giving a worker ID.
>>>>>>>> 5. Tool X will make sure that when a worker goes down it will
>>> notify
>>>>>>>> orchestrator. During a worker failure, based on step 3 and 4
>>>>>> orchestrator
>>>>>>>> can poll all the active jobs of that worker and do the same thing
>>>>>> like in
>>>>>>>> step 1 (store the experiment ID to the queue) and gfac worker
>>> will
>>>>>> pick
>>>>>>> the
>>>>>>>> jobs.
>>>>>>>> 
>>>>>>>> 6. When GFAC receive a job like in step 5 it have to carefully
>>>>>> evaluate
>>>>>>> the
>>>>>>>> state from registry and decide what to be done (If the job is
>>>> pending
>>>>>>> then
>>>>>>>> gfac just have to monitor, if job state is like input transferred
>>>> not
>>>>>>> even
>>>>>>>> submitted gfac has to execute rest of the chain and submit the
>>> job
>>>> to
>>>>>> the
>>>>>>>> resource and start monitoring).
>>>>>>>> 
>>>>>>>> If we can find a tool X which supports all these features and
>>> tool
>>>>>> itself
>>>>>>>> is fault tolerance and support atomicity, high availability and
>>>> simply
>>>>>>> API
>>>>>>>> to implement we can use that tool.
>>>>>>>> 
>>>>>>>> WDYT ?
>>>>>>>> 
>>>>>>>> Lahiru
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Mon, Jun 16, 2014 at 2:38 PM, Supun Kamburugamuva <
>>>>>> [email protected]>
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> Hi Lahiru,
>>>>>>>>> 
>>>>>>>>> Before moving with an implementation it may be worth to
>>> consider
>>>>>> some
>>>>>>> of
>>>>>>>>> the following aspects as well.
>>>>>>>>> 
>>>>>>>>> 1. How to report the progress of an experiment as state in
>>>>>> ZooKeeper?
>>>>>>>> What
>>>>>>>>> happens if a GFac instance crashes while executing an
>>> experiment?
>>>>>> Are
>>>>>>>> there
>>>>>>>>> check-points we can save so that another GFac instance can take
>>>>>> over?
>>>>>>>>> 2. What is the threading model of GFac instances? (I consider
>>> this
>>>>>> as a
>>>>>>>>> very important aspect)
>>>>>>>>> 3. What are the information needed to be stored in the
>>> ZooKeeper?
>>>>>> You
>>>>>>> may
>>>>>>>>> need to store other information about an experiment apart from
>>> its
>>>>>>>>> experiment ID.
>>>>>>>>> 4. How to report errors?
>>>>>>>>> 5. For GFac weather you need a threading model or worker
>>> process
>>>>>> model?
>>>>>>>>> 
>>>>>>>>> Thanks,
>>>>>>>>> Supun..
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On Mon, Jun 16, 2014 at 2:22 PM, Lahiru Gunathilake <
>>>>>> [email protected]
>>>>>>>> 
>>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> Hi All,
>>>>>>>>>> 
>>>>>>>>>> I think the conclusion is like this,
>>>>>>>>>> 
>>>>>>>>>> 1, We make the gfac as a worker not a thrift service and we
>>> can
>>>>>> start
>>>>>>>>>> multiple workers either with bunch of providers and handlers
>>>>>>> configured
>>>>>>>>> in
>>>>>>>>>> each worker or provider specific  workers to handle the class
>>>> path
>>>>>>>> issues
>>>>>>>>>> (not the common scenario).
>>>>>>>>>> 
>>>>>>>>>> 2. Gfac workers can be configured to watch for a given path
>>> in
>>>>>>>> zookeeper,
>>>>>>>>>> and multiple workers can listen to the same path. Default
>>> path
>>>>>> can be
>>>>>>>>>> /airavata/gfac or can configure paths like
>>> /airavata/gfac/gsissh
>>>>>>>>>> /airavata/gfac/bes.
>>>>>>>>>> 
>>>>>>>>>> 3. Orchestrator can configure with a logic to store
>>> experiment
>>>>>> IDs in
>>>>>>>>>> zookeeper with a path, and orchestrator can be configured to
>>>>>> provider
>>>>>>>>>> specific path logic too. So when a new request come
>>> orchestrator
>>>>>>> store
>>>>>>>>> the
>>>>>>>>>> experimentID and these experiments IDs are stored in Zk as a
>>>>>> queue.
>>>>>>>>>> 
>>>>>>>>>> 4. Since gfac workers are watching they will be notified and
>>> as
>>>>>> supun
>>>>>>>>>> suggested can use a leader selection algorithm[1] and one
>>> gfac
>>>>>> worker
>>>>>>>>> will
>>>>>>>>>> take the leadership for each experiment. If there are gfac
>>>>>> instances
>>>>>>>> for
>>>>>>>>>> each provider same logic will apply among those nodes with
>>> same
>>>>>>>> provider
>>>>>>>>>> type.
>>>>>>>>>> 
>>>>>>>>>> [1]
>>>> http://curator.apache.org/curator-recipes/leader-election.html
>>>>>>>>>> 
>>>>>>>>>> I would like to implement this if there are  no objections.
>>>>>>>>>> 
>>>>>>>>>> Lahiru
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> On Mon, Jun 16, 2014 at 11:51 AM, Supun Kamburugamuva <
>>>>>>>> [email protected]
>>>>>>>>>> 
>>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> Hi Marlon,
>>>>>>>>>>> 
>>>>>>>>>>> I think you are exactly correct.
>>>>>>>>>>> 
>>>>>>>>>>> Supun..
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> On Mon, Jun 16, 2014 at 11:48 AM, Marlon Pierce <
>>>>>> [email protected]>
>>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> Let me restate this, and please tell me if I'm wrong.
>>>>>>>>>>>> 
>>>>>>>>>>>> Orchestrator decides (somehow) that a particular job
>>>> requires
>>>>>>>>> JSDL/BES,
>>>>>>>>>>> so
>>>>>>>>>>>> it places the Experiment ID in Zookeeper's
>>>>>>> /airavata/gfac/jsdl-bes
>>>>>>>>>> node.
>>>>>>>>>>>> GFAC servers associated with this instance notice the
>>>> update.
>>>>>>> The
>>>>>>>>>> first
>>>>>>>>>>>> GFAC to claim the job gets it, uses the Experiment ID to
>>> get
>>>>>> the
>>>>>>>>>> detailed
>>>>>>>>>>>> information it needs from the Registry.  ZooKeeper
>>> handles
>>>> the
>>>>>>>>> locking,
>>>>>>>>>>> etc
>>>>>>>>>>>> to make sure that only one GFAC at a time is trying to
>>>> handle
>>>>>> an
>>>>>>>>>>> experiment.
>>>>>>>>>>>> 
>>>>>>>>>>>> Marlon
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> On 6/16/14, 11:42 AM, Lahiru Gunathilake wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>>> Hi Supun,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Thanks for the clarification.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Regards
>>>>>>>>>>>>> Lahiru
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Mon, Jun 16, 2014 at 11:38 AM, Supun Kamburugamuva <
>>>>>>>>>>> [email protected]>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Hi Lahiru,
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> My suggestion is that may be you don't need a Thrift
>>>> service
>>>>>>>>> between
>>>>>>>>>>>>>> Orchestrator and the component executing the
>>> experiment.
>>>>>> When a
>>>>>>>> new
>>>>>>>>>>>>>> experiment is submitted, orchestrator decides who can
>>>>>> execute
>>>>>>>> this
>>>>>>>>>> job.
>>>>>>>>>>>>>> Then it put the information about this experiment
>>>> execution
>>>>>> in
>>>>>>>>>>> ZooKeeper.
>>>>>>>>>>>>>> The component which wants to executes the experiment is
>>>>>>> listening
>>>>>>>>> to
>>>>>>>>>>> this
>>>>>>>>>>>>>> ZooKeeper path and when it sees the experiment it will
>>>>>> execute
>>>>>>>> it.
>>>>>>>>> So
>>>>>>>>>>>>>> that
>>>>>>>>>>>>>> the communication happens through an state change in
>>>>>> ZooKeeper.
>>>>>>>>> This
>>>>>>>>>>> can
>>>>>>>>>>>>>> potentially simply your architecture.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>> Supun.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Mon, Jun 16, 2014 at 11:14 AM, Lahiru Gunathilake <
>>>>>>>>>>> [email protected]>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Hi Supun,
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> So your suggestion is to create a znode for each
>>> thrift
>>>>>>> service
>>>>>>>> we
>>>>>>>>>>> have
>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>> when the request comes that node gets modified with
>>> input
>>>>>> data
>>>>>>>> for
>>>>>>>>>>> that
>>>>>>>>>>>>>>> request and thrift service is having a watch for that
>>>> node
>>>>>> and
>>>>>>>> it
>>>>>>>>>> will
>>>>>>>>>>>>>>> be
>>>>>>>>>>>>>>> notified because of the watch and it can read the
>>> input
>>>>>> from
>>>>>>>>>> zookeeper
>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>> invoke the operation?
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Lahiru
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On Thu, Jun 12, 2014 at 11:50 PM, Supun Kamburugamuva
>>> <
>>>>>>>>>>>>>>> [email protected]>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Here is what I think about Airavata and ZooKeeper. In
>>>>>>> Airavata
>>>>>>>>>> there
>>>>>>>>>>>>>>>> are
>>>>>>>>>>>>>>>> many components and these components must be
>>> stateless
>>>> to
>>>>>>>> achieve
>>>>>>>>>>>>>>>> scalability and reliability.Also there must be a
>>>>>> mechanism to
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> communicate
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> between the components. At the moment Airavata uses
>>> RPC
>>>>>> calls
>>>>>>>>> based
>>>>>>>>>>> on
>>>>>>>>>>>>>>>> Thrift for the communication.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> ZooKeeper can be used both as a place to hold state
>>> and
>>>>>> as a
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> communication
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> layer between the components. I'm involved with a
>>>> project
>>>>>>> that
>>>>>>>>> has
>>>>>>>>>>> many
>>>>>>>>>>>>>>>> distributed components like AIravata. Right now we
>>> use
>>>>>> Thrift
>>>>>>>>>>> services
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> communicate among the components. But we find it
>>>>>> difficult to
>>>>>>>> use
>>>>>>>>>> RPC
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> calls
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> and achieve stateless behaviour and thinking of
>>>> replacing
>>>>>>>> Thrift
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> services
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> with ZooKeeper based communication layer. So I think
>>> it
>>>> is
>>>>>>>> better
>>>>>>>>>> to
>>>>>>>>>>>>>>>> explore the possibility of removing the Thrift
>>> services
>>>>>>> between
>>>>>>>>> the
>>>>>>>>>>>>>>>> components and use ZooKeeper as a communication
>>>> mechanism
>>>>>>>> between
>>>>>>>>>> the
>>>>>>>>>>>>>>>> services. If you do this you will have to move the
>>> state
>>>>>> to
>>>>>>>>>> ZooKeeper
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> will automatically achieve the stateless behaviour in
>>>> the
>>>>>>>>>> components.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Also I think trying to make ZooKeeper optional is a
>>> bad
>>>>>> idea.
>>>>>>>> If
>>>>>>>>> we
>>>>>>>>>>> are
>>>>>>>>>>>>>>>> trying to integrate something fundamentally
>>> important to
>>>>>>>>>> architecture
>>>>>>>>>>>>>>>> as
>>>>>>>>>>>>>>>> how to store state, we shouldn't make it optional.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>> Supun..
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> On Thu, Jun 12, 2014 at 10:57 PM, Shameera
>>> Rathnayaka <
>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Hi Lahiru,
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> As i understood,  not only reliability , you are
>>> trying
>>>>>> to
>>>>>>>>> achieve
>>>>>>>>>>>>>>>>> some
>>>>>>>>>>>>>>>>> other requirement by introducing zookeeper, like
>>> health
>>>>>>>>> monitoring
>>>>>>>>>>> of
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> services, categorization with service implementation
>>> etc
>>>>>> ...
>>>>>>> .
>>>>>>>> In
>>>>>>>>>>> that
>>>>>>>>>>>>>>>>> case, i think we can get use of zookeeper's features
>>>> but
>>>>>> if
>>>>>>> we
>>>>>>>>>> only
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> focus
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> on reliability, i have little bit of concern, why
>>> can't
>>>> we
>>>>>>> use
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> clustering +
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> LB ?
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Yes it is better we add Zookeeper as a prerequisite
>>> if
>>>>>> user
>>>>>>>> need
>>>>>>>>>> to
>>>>>>>>>>>>>>>>> use
>>>>>>>>>>>>>>>>> it.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>  Shameera.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> On Thu, Jun 12, 2014 at 5:19 AM, Lahiru Gunathilake
>>> <
>>>>>>>>>>>>>>>>> [email protected]
>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Hi Gagan,
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> I need to start another discussion about it, but I
>>> had
>>>>>> an
>>>>>>>>> offline
>>>>>>>>>>>>>>>>>> discussion with Suresh about auto-scaling. I will
>>>> start
>>>>>>>> another
>>>>>>>>>>>>>>>>>> thread
>>>>>>>>>>>>>>>>>> about this topic too.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Regards
>>>>>>>>>>>>>>>>>> Lahiru
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> On Wed, Jun 11, 2014 at 4:10 PM, Gagan Juneja <
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> [email protected]
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Thanks Lahiru for pointing to nice library, added
>>> to
>>>> my
>>>>>>>>>> dictionary
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> :).
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> I would like to know how are we planning to start
>>>>>> multiple
>>>>>>>>>> servers.
>>>>>>>>>>>>>>>>>>> 1. Spawning new servers based on load? Some times
>>> we
>>>>>> call
>>>>>>> it
>>>>>>>>> as
>>>>>>>>>>> auto
>>>>>>>>>>>>>>>>>>> scalable.
>>>>>>>>>>>>>>>>>>> 2. To make some specific number of nodes available
>>>>>> such as
>>>>>>>> we
>>>>>>>>>>> want 2
>>>>>>>>>>>>>>>>>>> servers to be available at any time so if one goes
>>>> down
>>>>>>>> then I
>>>>>>>>>>> need
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> spawn one new to make available servers count 2.
>>>>>>>>>>>>>>>>>>> 3. Initially start all the servers.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> In scenario 1 and 2 zookeeper does make sense but
>>> I
>>>>>> don't
>>>>>>>>>> believe
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> existing
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> architecture support this?
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>>>>>> Gagan
>>>>>>>>>>>>>>>>>>> On 12-Jun-2014 1:19 am, "Lahiru Gunathilake" <
>>>>>>>>> [email protected]
>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Hi Gagan,
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Thanks for your response. Please see my inline
>>>>>> comments.
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> On Wed, Jun 11, 2014 at 3:37 PM, Gagan Juneja <
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> [email protected]>
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Hi Lahiru,
>>>>>>>>>>>>>>>>>>>>> Just my 2 cents.
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> I am big fan of zookeeper but also against
>>> adding
>>>>>>> multiple
>>>>>>>>>> hops
>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> system which can add unnecessary complexity. Here
>>> I
>>>> am
>>>>>> not
>>>>>>>>> able
>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>> understand the requirement of zookeeper may be
>>> I am
>>>>>>> wrong
>>>>>>>>>>> because
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> less
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> knowledge of the airavata system in whole. So I
>>> would
>>>>>> like
>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> discuss
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> following point.
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 1. How it will help us in making system more
>>>>>> reliable.
>>>>>>>>>> Zookeeper
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> not
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> able to restart services. At max it can tell
>>> whether
>>>>>>> service
>>>>>>>>> is
>>>>>>>>>> up
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> or not
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> which could only be the case if airavata service
>>> goes
>>>>>> down
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> gracefully and
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> we have any automated way to restart it. If this
>>> is
>>>>>> just
>>>>>>>>> matter
>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> routing
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> client requests to the available thrift servers
>>> then
>>>>>> this
>>>>>>>> can
>>>>>>>>> be
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> achieved
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> with the help of load balancer which I guess is
>>>> already
>>>>>>>> there
>>>>>>>>> in
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> thrift
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> wish list.
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> We have multiple thrift services and currently
>>> we
>>>>>> start
>>>>>>>>> only
>>>>>>>>>>> one
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> instance
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> of them and each thrift service is a stateless
>>>>>> service. To
>>>>>>>>> keep
>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> high
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> availability we have to start multiple instances
>>> of
>>>>>> them
>>>>>>> in
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> production
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> scenario. So for clients to get an available thrift
>>>>>> service
>>>>>>> we
>>>>>>>>> can
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> use
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> zookeeper znodes to represent each available
>>> service.
>>>>>> There
>>>>>>>> are
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> some
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> libraries which is doing similar[1] and I think we
>>> can
>>>>>> use
>>>>>>>> them
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> directly.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 2. As far as registering of different providers is
>>>>>>> concerned
>>>>>>>>> do
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> you
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> think for that we really need external store.
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Yes I think so, because its light weight and
>>>>>> reliable
>>>>>>> and
>>>>>>>>> we
>>>>>>>>>>> have
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> do
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> very minimal amount of work to achieve all these
>>>>>> features
>>>>>>> to
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Airavata
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> because zookeeper handle all the complexity.
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> I have seen people using zookeeper more for
>>> state
>>>>>>>> management
>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>>>> distributed environments.
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> +1, we might not be the most effective users of
>>>>>>> zookeeper
>>>>>>>>>>> because
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> all
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> our services are stateless services, but my point
>>> is
>>>> to
>>>>>>>>> achieve
>>>>>>>>>>>>>>>>>>>> fault-tolerance we can use zookeeper and with
>>>> minimal
>>>>>>> work.
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>   I would like to understand more how can we
>>>> leverage
>>>>>>>>>> zookeeper
>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>>>> airavata to make system reliable.
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> [1]
>>> https://github.com/eirslett/thrift-zookeeper
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>>>>>>>> Gagan
>>>>>>>>>>>>>>>>>>>>> On 12-Jun-2014 12:33 am, "Marlon Pierce" <
>>>>>>> [email protected]
>>>>>>>>> 
>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Thanks for the summary, Lahiru. I'm cc'ing the
>>>>>>>> Architecture
>>>>>>>>>>> list
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> additional comments.
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> Marlon
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> On 6/11/14 2:27 PM, Lahiru Gunathilake wrote:
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> Hi All,
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> I did little research about Apache
>>> Zookeeper[1]
>>>> and
>>>>>>> how
>>>>>>>> to
>>>>>>>>>> use
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> airavata. Its really a nice way to achieve fault
>>>>>>> tolerance
>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> reliable
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> communication between our thrift services and
>>>>>> clients.
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> Zookeeper
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> is a
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> distributed, fault tolerant system to do a
>>> reliable
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> communication
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> between
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> distributed applications. This is like an
>>>> in-memory
>>>>>>> file
>>>>>>>>>>> system
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> which
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> has
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> nodes in a tree structure and each node can
>>> have
>>>>>> small
>>>>>>>>>> amount
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> data
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> associated with it and these nodes are called
>>>> znodes.
>>>>>>>> Clients
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> can
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> connect
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> to a zookeeper server and add/delete and
>>> update
>>>>>> these
>>>>>>>>>> znodes.
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>   In Apache Airavata we start multiple thrift
>>>>>>> services
>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> these
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> can
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> go
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> down for maintenance or these can crash, if we
>>>> use
>>>>>>>>> zookeeper
>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> store
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> these
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> configuration(thrift service configurations)
>>> we
>>>> can
>>>>>>>>> achieve
>>>>>>>>>> a
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> very
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> reliable
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> system. Basically thrift clients can
>>> dynamically
>>>>>>>> discover
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> available
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> service
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> by using ephemeral znodes(Here we do not have
>>> to
>>>>>>> change
>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> generated
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> thrift client code but we have to change the
>>>>>> locations we
>>>>>>>> are
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> invoking
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> them). ephemeral znodes will be removed when the
>>>>>> thrift
>>>>>>>>> service
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> goes
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> down
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> and zookeeper guarantee the atomicity between
>>>> these
>>>>>>>>>>> operations.
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> With
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> approach we can have a node hierarchy for
>>>> multiple
>>>>>> of
>>>>>>>>>>> airavata,
>>>>>>>>>>>>>>>>>>>>>>> orchestrator,appcatalog and gfac thrift
>>> services.
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> For specifically for gfac we can have
>>> different
>>>>>> types
>>>>>>> of
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> services
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> each
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> provider implementation. This can be achieved
>>> by
>>>>>> using
>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> hierarchical
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> support in zookeeper and providing some logic
>>> in
>>>>>>>>> gfac-thrift
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> service
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> register it to a defined path. Using the same
>>>> logic
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> orchestrator
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> can
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> discover the provider specific gfac thrift
>>> service
>>>> and
>>>>>>>> route
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> message to
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> the correct thrift service.
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> With this approach I think we simply have
>>> write
>>>>>> some
>>>>>>>>> client
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> code
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> thrift
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> services and clients and zookeeper server
>>>>>> installation
>>>>>>>> can
>>>>>>>>>> be
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> done as
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> separate process and it will be easier to keep
>>>> the
>>>>>>>>> Zookeeper
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> server
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> separate from Airavata because installation of
>>>>>> Zookeeper
>>>>>>>>> server
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> little
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> complex in production scenario. I think we have
>>> to
>>>>>> make
>>>>>>>> sure
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> everything
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> works fine when there is no Zookeeper running,
>>>> ex:
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> enable.zookeeper=false
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> should works fine and users doesn't have to
>>>>>> download
>>>>>>> and
>>>>>>>>>> start
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> zookeeper.
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> [1]http://zookeeper.apache.org/
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>>>>>>>>>>> Lahiru
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>> System Analyst Programmer
>>>>>>>>>>>>>>>>>>>> PTI Lab
>>>>>>>>>>>>>>>>>>>> Indiana University
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>> System Analyst Programmer
>>>>>>>>>>>>>>>>>> PTI Lab
>>>>>>>>>>>>>>>>>> Indiana University
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>>>> Shameera Rathnayaka.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> email: shameera AT apache.org , shameerainfo AT
>>>>>> gmail.com
>>>>>>>>>>>>>>>>> Blog : http://shameerarathnayaka.blogspot.com/
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> Supun Kamburugamuva
>>>>>>>>>>>>>>>> Member, Apache Software Foundation;
>>>> http://www.apache.org
>>>>>>>>>>>>>>>> E-mail: [email protected];  Mobile: +1 812 369 6762
>>>>>>>>>>>>>>>> Blog: http://supunk.blogspot.com
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> System Analyst Programmer
>>>>>>>>>>>>>>> PTI Lab
>>>>>>>>>>>>>>> Indiana University
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> Supun Kamburugamuva
>>>>>>>>>>>>>> Member, Apache Software Foundation;
>>> http://www.apache.org
>>>>>>>>>>>>>> E-mail: [email protected];  Mobile: +1 812 369 6762
>>>>>>>>>>>>>> Blog: http://supunk.blogspot.com
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> --
>>>>>>>>>>> Supun Kamburugamuva
>>>>>>>>>>> Member, Apache Software Foundation; http://www.apache.org
>>>>>>>>>>> E-mail: [email protected];  Mobile: +1 812 369 6762
>>>>>>>>>>> Blog: http://supunk.blogspot.com
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> --
>>>>>>>>>> System Analyst Programmer
>>>>>>>>>> PTI Lab
>>>>>>>>>> Indiana University
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> --
>>>>>>>>> Supun Kamburugamuva
>>>>>>>>> Member, Apache Software Foundation; http://www.apache.org
>>>>>>>>> E-mail: [email protected];  Mobile: +1 812 369 6762
>>>>>>>>> Blog: http://supunk.blogspot.com
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> --
>>>>>>>> System Analyst Programmer
>>>>>>>> PTI Lab
>>>>>>>> Indiana University
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> Supun Kamburugamuva
>>>>>> Member, Apache Software Foundation; http://www.apache.org
>>>>>> E-mail: [email protected];  Mobile: +1 812 369 6762
>>>>>> Blog: http://supunk.blogspot.com
>>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> System Analyst Programmer
>>>>> PTI Lab
>>>>> Indiana University
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> System Analyst Programmer
>>>> PTI Lab
>>>> Indiana University
>>>> 
>>> 
>> 
>> 
>> 
>> --
>> System Analyst Programmer
>> PTI Lab
>> Indiana University
>> 
> 
> 
> 
> -- 
> System Analyst Programmer
> PTI Lab
> Indiana University

Re: Zookeeper in Airavata to achieve reliability

Reply via email to