Re: [gt-user] GRAM 5 behaviour when job proxy expires

Yuriy Halytskyy Thu, 11 Aug 2011 19:10:12 -0700

ok, now regardless of job exit status, I am getting 0:

Message : OK
Code    : 200
Length  : 131
Chunked : false
Type    : application/x-globus-gram
Protocol-version : 2
Status           : 8
Failure-code     : 0
Job failure code     : 0
Job exit code     : 0



Cheers,
Yuriy

Excerpts from Yuriy Halytskyy's message of Fri Aug 12 13:25:56 +1200 2011:
> > The restart operation ought to work.
> 
> Yeah it does, my mistake. I tried to restart job with very short proxy
> lifetime and it returned 131 status. Also new job submissions fail
> with proxy lifetime < 10 min (?). But if the proxy is fresh, restart
> works as it should. 
> 
> >    (potentially) the TCP port number. This ought to be enough to get the 
> > audit
> >    record to happen.
> 
> The problem with that is jobs should be audited regardless of how user
> behaves. If job is never checked again (user just copied outputs over
> gridftp and forgot about the job), there is no record.
> 
> > Note that if you do the GRAM two-phase commit protocol, the job state will 
> > remain 
> > in place until a client acknowledges it, so that you can do #2 and check for
> > status whenever you are able, even after the job terminates.
> 
> thanks, this is exactly what I need. My code used to assume that if
> restart fails, the job is complete, but using two-phase I can actually
> see what happened. 
> 
> Cheers,
> Yuriy
> 
> 
> 
> 
> 
> 
> 
> 
> 
> Cheers,
> Yuriy
> 
> Excerpts from Joseph Bester's message of Fri Aug 12 02:48:59 +1200 2011:
> > On Aug 10, 2011, at 7:26 PM, Yuriy Halytskyy wrote:
> > > Hi,
> > > 
> > > When job proxy expires, its manager dies, but the job itself keeps
> > > running. There is no way to check its status, and even when job is
> > > submitted with save_state=true, the restart does not work. Also, audit
> > > never puts the record of completed job into the database. 
> > > 
> > > On the other hand if I submit two jobs, first one with long proxy, and
> > > second job with shorter proxy, even when second proxy expires I can
> > > still query the job as long as first proxy is valid and job manager is
> > > running. 
> > > 
> > > GRAM4 never had this problem, even when proxy expires job status is
> > > still available and it is properly audited. Is it possible for gram5
> > > to have the same behaviour? At least being able to restart job manager
> > > after proxy expiration and have it properly audited. 
> > 
> > The restart operation ought to work. There are a few ways it can happen;
> > depending on how you are monitoring the job, you might have to do different
> > things.
> > 
> > 1. Submit any job to the same resource as the original job. When a new job
> >    manager is started, it will resume monitoring whatever jobs remain from
> >    previous job managers. Job state callbacks will be sent to clients which
> >    were registered to the previous job manager process with the new job 
> > manager
> >    contact. This contact will be the same as the old contact except for
> >    (potentially) the TCP port number. This ought to be enough to get the 
> > audit
> >    record to happen.
> > 
> > 2. Submit a job with the RSL &(restart=old-job-manager-contact). The 
> > response 
> >    to this will be the new job contact and the jobs current state. If there 
> > was
> >    no job manager running, it will act like #1 as well, resuming all 
> > existing
> >    job monitoring and state callback operations.
> > 
> > If you attempt to use the gram status API instead of relying on callbacks, 
> > you 
> > won't be able to get status unless you do #2, because you won't know the 
> > port 
> > to contact. I'd like to some day add more messaging through the gatekeeper 
> > so
> > that the job manager doesn't have to have it's own port for receiving 
> > messages
> > and we don't have to deal with such problems.
> > 
> > Note that if you do the GRAM two-phase commit protocol, the job state will 
> > remain 
> > in place until a client acknowledges it, so that you can do #2 and check for
> > status whenever you are able, even after the job terminates.
> > 
> > Joe

Re: [gt-user] GRAM 5 behaviour when job proxy expires

Reply via email to