ok, now regardless of job exit status, I am getting 0: Message : OK Code : 200 Length : 131 Chunked : false Type : application/x-globus-gram Protocol-version : 2 Status : 8 Failure-code : 0 Job failure code : 0 Job exit code : 0
Cheers, Yuriy Excerpts from Yuriy Halytskyy's message of Fri Aug 12 13:25:56 +1200 2011: > > The restart operation ought to work. > > Yeah it does, my mistake. I tried to restart job with very short proxy > lifetime and it returned 131 status. Also new job submissions fail > with proxy lifetime < 10 min (?). But if the proxy is fresh, restart > works as it should. > > > (potentially) the TCP port number. This ought to be enough to get the > > audit > > record to happen. > > The problem with that is jobs should be audited regardless of how user > behaves. If job is never checked again (user just copied outputs over > gridftp and forgot about the job), there is no record. > > > Note that if you do the GRAM two-phase commit protocol, the job state will > > remain > > in place until a client acknowledges it, so that you can do #2 and check for > > status whenever you are able, even after the job terminates. > > thanks, this is exactly what I need. My code used to assume that if > restart fails, the job is complete, but using two-phase I can actually > see what happened. > > Cheers, > Yuriy > > > > > > > > > > Cheers, > Yuriy > > Excerpts from Joseph Bester's message of Fri Aug 12 02:48:59 +1200 2011: > > On Aug 10, 2011, at 7:26 PM, Yuriy Halytskyy wrote: > > > Hi, > > > > > > When job proxy expires, its manager dies, but the job itself keeps > > > running. There is no way to check its status, and even when job is > > > submitted with save_state=true, the restart does not work. Also, audit > > > never puts the record of completed job into the database. > > > > > > On the other hand if I submit two jobs, first one with long proxy, and > > > second job with shorter proxy, even when second proxy expires I can > > > still query the job as long as first proxy is valid and job manager is > > > running. > > > > > > GRAM4 never had this problem, even when proxy expires job status is > > > still available and it is properly audited. Is it possible for gram5 > > > to have the same behaviour? At least being able to restart job manager > > > after proxy expiration and have it properly audited. > > > > The restart operation ought to work. There are a few ways it can happen; > > depending on how you are monitoring the job, you might have to do different > > things. > > > > 1. Submit any job to the same resource as the original job. When a new job > > manager is started, it will resume monitoring whatever jobs remain from > > previous job managers. Job state callbacks will be sent to clients which > > were registered to the previous job manager process with the new job > > manager > > contact. This contact will be the same as the old contact except for > > (potentially) the TCP port number. This ought to be enough to get the > > audit > > record to happen. > > > > 2. Submit a job with the RSL &(restart=old-job-manager-contact). The > > response > > to this will be the new job contact and the jobs current state. If there > > was > > no job manager running, it will act like #1 as well, resuming all > > existing > > job monitoring and state callback operations. > > > > If you attempt to use the gram status API instead of relying on callbacks, > > you > > won't be able to get status unless you do #2, because you won't know the > > port > > to contact. I'd like to some day add more messaging through the gatekeeper > > so > > that the job manager doesn't have to have it's own port for receiving > > messages > > and we don't have to deal with such problems. > > > > Note that if you do the GRAM two-phase commit protocol, the job state will > > remain > > in place until a client acknowledges it, so that you can do #2 and check for > > status whenever you are able, even after the job terminates. > > > > Joe
