Thanks for the feedback, Mark. * CPI is a term we coined on an Airavata developer mail thread. It is just an API of a major Airavata component. Since we are using API to mean specifically the gateway developer's API, we didn't want to overload the term. So CPI is the interface that the Registry exposes to the Orchestrator, GFAC, etc. The mail thread is [1].
* Jobs and Tasks: yes, your restatement is correct. The idea is that tasks include pre- and post-processing for the job as well as the job. So post-processing may fail. * Jobs, Tasks, and Persistence (see "Job" in definitions) : The idea I'm trying to express is that there may be a persistently streaming source of information (such as Twitter, instrument feeds, UNIDATA weather data, ...). I call this source a "Job" but it is not explicitly initiated or managed by Airavata. Airavata does manage the connection to the persistent Job for the gateway. A gateway may define a Task or workflow to use some of the data coming from the persistent job. There is the related concept of event-driven problems (if interesting data arrives, automatically trigger something else to happen). You raise some important points about user- or gateway-driven job, task and workflow persistence. My assumption in the document is that Tasks have lifetimes and are initiated by the gateway, and gateways do not submit persistent jobs to resources. But there are counter examples, such as running persistent workflows to handle incoming weather data and launch simulations through event detection (the old LEAD storm modeling use case). So this needs more thought. * Orchestrator and Info Services: This section of the doc needs to be fleshed out with more use cases. I don't have a specific strategy in mind for how to uniformly merge information from many different types of resources (XSEDE, OSG, IaaS and PaaS clouds, etc). In the wiki entry, I wanted to point out that this was the Orchestrator's job, and we'll fill in the details later. I think the gateway should always be able to specify the resources to be used if it wants to, and that this should override other considerations. If Airavata gets this information from the gateway, this is what it uses to run the job. If Airavata doesn't get this information, then it does its best to decide. If the gateway requests a resource that Airavata knows isn't available, it either throws an exception or else internally schedules the job and puts it on hold until the resource is available again. The details depend upon the Orchestrator implementation. Currently the Orchestrator implementation does not do any sophisticated internal scheduling, but we hope to be able to smoothly add in these capabilities. Airavata's role should be to add value to resource information services like INCA. As you know from running CIPRES, gateways will detect problems with resources more quickly than XSEDE information services, so this is the sort of information that Airavata needs to collect and expose to gateways. There may also be gateway-specific information, like code performance on specific resources, that XSEDE won't provide. Adding value also includes making it easy for gateway developers to get started quickly with a single SDK that includes everything they need (such as simple clients to INCA). In Airavata, the exposure of this information to gateways will be through the API. Internally, it is worth considering if this should all be in the Orchestrator or if we need a different top-level component. Socrates also claimed naivete. Marlon [1] http://mail-archives.apache.org/mod_mbox/airavata-dev/201401.mbox/%3CCANotcz4teRQ=l22lcubgdpwa_i5m-jpzzzmozaeqnz29009...@mail.gmail.com%3E On 5/14/14 1:18 PM, Miller, Mark wrote: > Thanks for doing that Marlon, > > I have a couple of questions, sorry for my naivete. > > Is the term CPI specific to Airavata? I have not heard it before. > > When you state: > "Note jobs can complete but tasks can fail." > > do you mean: > "Note that although jobs within a given task can complete, the task they are > contained in may fail."? > > Question: > Is it really possible for a user-initiated activity to be persistent? Maybe I > don't understand the use case or language. > I wonder how scalable it can be for user-initiated activity to persist. > > Question: > If jobs can be persistent, tasks and workflows may also be persistent, right? > This also seems potentially like an issue, if my understanding of persistence > is correct. > > For the Orchestrator: > If it knows the status of XSEDE and other resources, do we know how it gets > that information? Is there a specific way it plugs in to other remote > resources that ensures that info is provided (in other words, there are many > kinds of resources, and perhaps many ways of broadcasting their condition; or > maybe it is just online/offline?) > > Also, if Orchestrator knows the status of the remote resources, can it pass > that information forward to the Gateway front end, so it can be printed in > the user interface somewhere? From my perspective, it is way cooler if the > user knows before submitting that there will be a delay, or re-routing of > their job. > > Mark > > -----Original Message----- > From: Marlon Pierce [mailto:[email protected]] > Sent: Wednesday, May 14, 2014 8:02 AM > To: [email protected]; [email protected] > Subject: Orchestrator description draft > > Dear all-- > > I've written up a draft description of the Orchestrator [1] and welcome > comments and critiques. As with the GFAC description, this is not > necessarily based on the current implementation. The purpose is to create an > implementation-independent description of the Orchestrator for future > reference. > > Some outcomes from this exercise: > > * The interactions of the Workflow Interpreter, Orchestrator, and API server > need to be thought out. Don't take my suggestions here too seriously. > > * The scheduler component of the Orchestrator needs more thought, especially > if there are multiple Orchestrators running (for load > balancing): we don't want to run into "thread" issues if multiple schedulers > are trying to work with the registry. > > * Our current concept for extending the Orchestrator is to extend the CPI. > You would do this to implement, for example, more sophisticated scheduling. > But we could take a GFAC approach of having a core and developer-provided > plugins (for scheduling, quality of service, etc). > > Marlon > > [1] > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=40511565
