Re: Orchestrator description draft

Marlon Pierce Fri, 16 May 2014 06:56:07 -0700

Thanks for the feedback, Mark. 

* CPI is a term we coined on an Airavata developer mail thread.  It is
just an API of a major Airavata component. Since we are using API to
mean specifically the gateway developer's API, we didn't want to
overload the term.  So CPI is the interface that the Registry exposes to
the Orchestrator, GFAC, etc.  The mail thread is [1].

* Jobs and Tasks: yes, your restatement is correct. The idea is that
tasks include pre- and post-processing for the job as well as the job. 
So post-processing may fail.

* Jobs, Tasks, and Persistence (see "Job" in definitions) :  The idea
I'm trying to express is that there may be a persistently streaming
source of information (such as Twitter, instrument feeds, UNIDATA
weather data, ...). I call this source a "Job" but it is not explicitly
initiated or managed by Airavata.  Airavata does manage the connection
to the persistent Job for the gateway. A gateway may define a Task or
workflow to use some of the data coming from the persistent job.  There
is the related concept of event-driven problems (if interesting data
arrives, automatically trigger something else to happen).

You raise some important points about user- or gateway-driven job, task
and workflow persistence.  My assumption in the document is that Tasks
have lifetimes and are initiated by the gateway, and gateways do not
submit persistent jobs to resources.  But there are counter examples,
such as running persistent workflows to handle incoming weather data and
launch simulations through event detection (the old LEAD storm modeling
use case).  So this needs more thought.

* Orchestrator and Info Services: This section of the doc needs to be
fleshed out with more use cases.  I don't have a specific strategy in
mind for how to uniformly merge information from many different types of
resources (XSEDE, OSG, IaaS and PaaS clouds, etc).  In the wiki entry, I
wanted to point out that this was the Orchestrator's job, and we'll fill
in the details later. 

I think the gateway should always be able to specify the resources to be
used if it wants to, and that this should override other considerations.
If Airavata gets this information from the gateway, this is what it uses
to run the job. If Airavata doesn't get this information, then it does
its best to decide.  If the gateway requests a resource that Airavata
knows isn't available, it either throws an exception or else internally
schedules the job and puts it on hold until the resource is available
again.  The details depend upon the Orchestrator implementation. 
Currently the Orchestrator implementation does not do any sophisticated
internal scheduling, but we hope to be able to smoothly add in these
capabilities.

Airavata's role should be to add value to resource information services
like INCA. As you know from running CIPRES, gateways will detect
problems with resources more quickly than XSEDE information services, so
this is the sort of information that Airavata needs to collect and
expose to gateways.  There may also be gateway-specific information,
like code performance on specific resources, that XSEDE won't provide. 
Adding value also includes making it easy for gateway developers to get
started quickly with a single SDK that includes everything they need
(such as simple clients to INCA).

In Airavata, the exposure of this information to gateways will be
through the API.  Internally, it is worth considering if this should all
be in the Orchestrator or if we need a different top-level component. 

Socrates also claimed naivete.

Marlon

[1]
http://mail-archives.apache.org/mod_mbox/airavata-dev/201401.mbox/%3CCANotcz4teRQ=l22lcubgdpwa_i5m-jpzzzmozaeqnz29009...@mail.gmail.com%3E

On 5/14/14 1:18 PM, Miller, Mark wrote:
> Thanks for doing that Marlon,
>
> I have a couple of questions, sorry for my naivete.
>
> Is the term CPI specific to Airavata? I have not heard it before.
>
> When you state:
> "Note jobs can complete but tasks can fail."
>
> do you mean:
> "Note that although jobs within a given task can complete,  the task they are 
> contained in may fail."?
>
> Question:
> Is it really possible for a user-initiated activity to be persistent? Maybe I 
> don't understand the use case or language.
> I wonder how scalable it can be for user-initiated activity to persist.
>
> Question:
> If jobs can be persistent, tasks and workflows may also be persistent, right? 
> This also seems potentially like an issue, if my understanding of persistence 
> is correct.
>
> For the Orchestrator:
> If it knows the status of XSEDE and other resources, do we know how it gets 
> that information? Is there a specific way it plugs in to other remote 
> resources that ensures that info is provided (in other words, there are many 
> kinds of resources, and perhaps many ways of broadcasting their condition; or 
> maybe it is just online/offline?)
>
> Also, if Orchestrator knows the status of the remote resources, can it pass 
> that information forward to the Gateway front end, so it can be printed in 
> the user interface somewhere? From my perspective, it is way cooler if the 
> user knows before submitting that there will be a delay, or re-routing of 
> their job.
>
> Mark
>
> -----Original Message-----
> From: Marlon Pierce [mailto:[email protected]] 
> Sent: Wednesday, May 14, 2014 8:02 AM
> To: [email protected]; [email protected]
> Subject: Orchestrator description draft
>
> Dear all--
>
> I've written up a draft description of the Orchestrator [1] and welcome 
> comments and critiques.  As with the GFAC description, this is not 
> necessarily based on the current implementation.  The purpose is to create an 
> implementation-independent description of the Orchestrator for future 
> reference.
>
> Some outcomes from this exercise:
>
> * The interactions of the Workflow Interpreter, Orchestrator, and API server 
> need to be thought out. Don't take my suggestions here too seriously.
>
> * The scheduler component of the Orchestrator needs more thought, especially 
> if there are multiple Orchestrators running (for load
> balancing): we don't want to run into "thread" issues if multiple schedulers 
> are trying to work with the registry.
>
> * Our current concept for extending the Orchestrator is to extend the CPI.  
> You would do this to implement, for example, more sophisticated scheduling.  
> But we could take a GFAC approach of having a core and developer-provided 
> plugins (for scheduling, quality of service, etc).
>
> Marlon
>
> [1]
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=40511565

Re: Orchestrator description draft

Reply via email to