How goes the implementation?

Marlon

On 8/13/14, 11:09 PM, Lahiru Gunathilake wrote:
Thank you very much for all the inputs ! This will take these in to
consideration.

Regards
Lahiru


On Wed, Aug 13, 2014 at 10:31 PM, Miller, Mark <[email protected]> wrote:

  If I understand this correctly, I want to offer some input from our
experience with CIPRES.

Currently, if a CIPRES user wishes to cancel a job, they must delete the
entire job, and therefore all ability to view the input and other files
used become unavailable.

This is not an ideal solution.



There is value to the user to being able to see partially completed
results, or even the input files they used.



So I would vote for making partial output of the job available as an
option.

Any additional information you can provide about status would be useful,
especially for folks who are debugging failures..



Just my 2c.



Mark



*From:* Eroma Abeysinghe [mailto:[email protected]]
*Sent:* Wednesday, August 13, 2014 7:04 AM
*To:* [email protected]
*Subject:* Re: Experiment Cancellation



My questions and thoughts on Experiment cancellation
1. What are we going to do for output or partial output of the job at the
time of cancelling?
     Are we going to discard or make them available for the experiment. Are
we safe keeping all the job information, messages on CANCELLED jobs or
discard them as well?

2. Are we going to allow editing for CANCELLED or CANCELLING experiments?
IMO we should not. because allowing editing is required if its going to
Re-launch.

3. With existing experiment and job states we need to decide which are
going to be CANCELLED
Out of Airavata Experiment states Cancellation should be allowed for
states;
CREATED
VALIDATED
SCHEDULED
LAUNCHED
EXECUTING
Cancellation should be communicated to resources if the job states are;
SUBMITTED
SETUP
QUEUED
ACTIVE
HELD


There is SUSPENDED state in both experiment and job but is this a
currently active state?

4. Cloning will be available for CANCELLED and CANCELLING experiments.

5. In Experiment Summary we should display any errors took place in
cancelling process





On Wed, Aug 13, 2014 at 9:01 AM, Marlon Pierce <[email protected]> wrote:

There is an advantage for task (or job) state to capture the information
that really comes from the machine (completed, cancelled, failed, etc), and
for experiment state to be set to canceled by Airavata.  That is, there
should be parts of Airavata that capture machine-specific state information
about the job for logging/auditing purposes.

* Airavata issues "cancel" command to job in "launched" or "executing"
state.

* Airavata confirms that the job has left the queue or is no longer
executing. This could be machine-specific, but the main question is "has
the job left the queue?" or "is the job no longer in executing state?"  I
don't think it is "if this is trestles, and since we issued a qdel command,
is the job marked as completed; of if this is stampede, is the job now
marked as failed?"

* If the job cancel works, the Airavata marks this as canceled.

* If cancel fails for some reason, don't change the Experiment state but
throw an error.


Marlon



On 8/13/14, 2:57 AM, Lahiru Gunathilake wrote:

Hi All,

I have few concerns about experiment cancellation. When we want to cancel
and experiment we have to run a particular command in the computing
resource. Based on the computing resource different resources show the job
status of the cancelled jobs in a different way. Ex: trestles shows the
cancelled jobs as completed, some other machines show it as as cancelled,
some might show it as failed.

I think we should replicated this information in the JobDetails object as
the Job status and make sure the Experiments and Task statuses as
cancelled. The other approach is when we cancel we explicitly make all the
states in the experiment model (experiments,tasks,job states as cancelled)
as cancelled and manually handle the state we get from the computing
resource.

My concerns should we really hide that information shown in the computing
resource from the Job status we are storing in to the registry ? or leave
it as it is and handle other statuses to represent the cancelled
experiments ? If we make everything cancel there will be inconsistency in
the JobStatus.

WDYT ?

Lahiru






--

Thank You,

Best Regards,

Eroma




Reply via email to