Hi All, I have added a google doc[1] with anyone to comment.
[1] https://docs.google.com/document/d/11fjql09tOiC0NLBaqdhZ9WAiMoBhkBJl7WC1N7DigcU/edit Regards Lahiru On Thu, Dec 5, 2013 at 2:34 PM, Lahiru Gunathilake <[email protected]>wrote: > Hi All, > > We are thinking of implementing an Airavata Orchestrator component to > replace WorkflowInterpreter to avoid gateway developers to dealing with > workflows when they simply have one single independent jobs to run in their > gateways. This component is mainly focusing on how to invoke GFAC and > accept requests from the client API. > > I have following features in mind about this component. > > 1. It gives a web services or REST interface where we can implement a > client to invoke it to submit jobs. > > 2. Accepts a job request and parse the input types and if input types are > correct, this will create an Airavata experiment ID. > > 3. Orchestrtor then store the job information to registry against the > generated experiment ID (All the other components identify the job using > this experiment ID). > > 4. After that Orchestrator pull up all the descriptors related to this > request and do some scheduling to decide where to run the job and submit > the job to a GFAC node (Handling multiple GFAC nodes is going to be a > future improvement in Orchestrator). > > If we are trying to do pull based job submission it might be a good idea > to handle errors, if we store jobs to Registry and GFAC pull jobs and > execute them Orchestrator component really doesn' t have to worry about the > error handling. > > Because we can implement a logic to GFAC if a particular job is not > updating its status fora g iven time it assume job is hanged or either GFAC > node which handles that job is fauiled, so GFAC pull that job (we > definitely need a locking mechanism here, to avoid two instances are not > going to execute hanged job) and start execute it. (If GFAC is handling a > long running job still it has to update the job stutus frequently with the > same status to make sure GFAC node is running). > > 5. GFAC creates its execution chain and store it back to registry with > experiment ID, and GFAC updates its states using check pointing. > > > 6. If we are not doing pull based submission,during a GFAC failure > Orchestrator have to identify it and submit the active jobs from failure > gfac node to other nodes. This might cause job duplication in case > Orchestrator falls alarm about GFAC failure (so have to handle carefully). > > We have lot more to discus about the GFAC but I limit our discussion to > Orchestrator component for now. > > WDYT about this design ? > > Lahiru > > -- > System Analyst Programmer > PTI Lab > Indiana University > -- System Analyst Programmer PTI Lab Indiana University
