Thanks for the responses. I will take these points in to consideration during cancel implementation.
Lahiru On Wed, Aug 13, 2014 at 7:33 PM, Eroma Abeysinghe < [email protected]> wrote: > My questions and thoughts on Experiment cancellation > 1. What are we going to do for output or partial output of the job at the > time of cancelling? > Are we going to discard or make them available for the experiment. Are > we safe keeping all the job information, messages on CANCELLED jobs or > discard them as well? > > 2. Are we going to allow editing for CANCELLED or CANCELLING experiments? > IMO we should not. because allowing editing is required if its going to > Re-launch. > > 3. With existing experiment and job states we need to decide which are > going to be CANCELLED > Out of Airavata Experiment states Cancellation should be allowed for > states; > CREATED > VALIDATED > SCHEDULED > LAUNCHED > EXECUTING > Cancellation should be communicated to resources if the job states are; > SUBMITTED > SETUP > QUEUED > ACTIVE > HELD > > There is SUSPENDED state in both experiment and job but is this a > currently active state? > > 4. Cloning will be available for CANCELLED and CANCELLING experiments. > > 5. In Experiment Summary we should display any errors took place in > cancelling process > > > > > > > > > > > > > On Wed, Aug 13, 2014 at 9:01 AM, Marlon Pierce <[email protected]> wrote: > >> There is an advantage for task (or job) state to capture the information >> that really comes from the machine (completed, cancelled, failed, etc), and >> for experiment state to be set to canceled by Airavata. That is, there >> should be parts of Airavata that capture machine-specific state information >> about the job for logging/auditing purposes. >> >> * Airavata issues "cancel" command to job in "launched" or "executing" >> state. >> >> * Airavata confirms that the job has left the queue or is no longer >> executing. This could be machine-specific, but the main question is "has >> the job left the queue?" or "is the job no longer in executing state?" I >> don't think it is "if this is trestles, and since we issued a qdel command, >> is the job marked as completed; of if this is stampede, is the job now >> marked as failed?" >> >> * If the job cancel works, the Airavata marks this as canceled. >> >> * If cancel fails for some reason, don't change the Experiment state but >> throw an error. >> >> >> Marlon >> >> >> On 8/13/14, 2:57 AM, Lahiru Gunathilake wrote: >> >>> Hi All, >>> >>> I have few concerns about experiment cancellation. When we want to cancel >>> and experiment we have to run a particular command in the computing >>> resource. Based on the computing resource different resources show the >>> job >>> status of the cancelled jobs in a different way. Ex: trestles shows the >>> cancelled jobs as completed, some other machines show it as as cancelled, >>> some might show it as failed. >>> >>> I think we should replicated this information in the JobDetails object as >>> the Job status and make sure the Experiments and Task statuses as >>> cancelled. The other approach is when we cancel we explicitly make all >>> the >>> states in the experiment model (experiments,tasks,job states as >>> cancelled) >>> as cancelled and manually handle the state we get from the computing >>> resource. >>> >>> My concerns should we really hide that information shown in the computing >>> resource from the Job status we are storing in to the registry ? or leave >>> it as it is and handle other statuses to represent the cancelled >>> experiments ? If we make everything cancel there will be inconsistency in >>> the JobStatus. >>> >>> WDYT ? >>> >>> Lahiru >>> >>> >> > > > -- > Thank You, > Best Regards, > Eroma > -- System Analyst Programmer PTI Lab Indiana University
