Hi Yuriy,

We think a similar issue was hit and fixed in (soon to be released) GT 5.1.2.  
It has not yet been back ported to 5.0.x

What is the priority on this?  How much is this affecting you / your users?

-Stu

On Sep 13, 2011, at Sep 13, 1:27 AM, Yuriy Halytskyy wrote:

> ok, now I can reproduce it. When proxy expires when job manager is
> waiting for COMMIT_END signal, it stops accepting new jobs. It seems I
> can restore it by sending commit_end, but this still looks like a bug
> to me as client may loose job id. 
> 
> 
> Cheers,
> Yuriy
> 
> Excerpts from Yuriy Halytskyy's message of Tue Sep 13 15:59:09 +1200 2011:
>> Hi,
>> 
>> user job manager gets into the state where the submission with globusrun 
>> hangs and job is never submitted
>> 
>> server logs say:
>> Sep 13 15:35:30 gram5 gridinfo[30804]: ts=2011-09-13T03:35:30.104424Z 
>> id=30804 event=gram.job.start level=INFO 
>> gramid=/16145890501405029996/576663433152357309/
>> peer=130.216.189.203:57672
>> Sep 13 15:35:30 gram5 gridinfo[30804]: ts=2011-09-13T03:35:30.104582Z 
>> id=30804 event=gram.add_request.end level=WARN 
>> gramid=/16145890501405029996/576663433152357309/ status=-130
>> reason="the job manager was sent a stop signal (job is still running)"
>> Sep 13 15:35:30 gram5 gridinfo[30804]: ts=2011-09-13T03:35:30.104885Z 
>> id=30804 event=gram.job.end level=INFO 
>> gramid=/16145890501405029996/576663433152357309/ status=-130 msg="Request
>> start failed" reason="the job manager was sent a stop signal (job is still 
>> running)"
>> 
>> submission with globusrun hangs:
>> 
>> globusrun -batch     -r gram5.ceres.auckland.ac.nz 
>> '&(executable=echo)(arguments= 
>> hello)(job_type=single)(count=1)(hostCount=1)(vo="/nz/nesi")(maxWalltime=10)(directory=/home/smas036)'
>> globus_gram_client_callback_allow successful
>> GRAM Job submission successful
>> https://gram5.ceres.auckland.ac.nz:40398/16145891598704212781/576663433152357309/
>> 
>> 
>> submission with two-phase does not hang and results in:
>> globusrun  -batch     -r gram5.ceres.auckland.ac.nz 
>> '&(two_phase=5)(executable=echo)(arguments=
>> hello)(job_type=single)(count=1)(hostCount=1)(vo="/nz/nesi")(maxWalltime=10)(directory=/home/smas036)'
>> globus_gram_client_callback_allow successful
>> GRAM Job submission failed because the job contact string does not match any 
>> which the job manager is handling (error code 156)
>> https://gram5.ceres.auckland.ac.nz:40398/16145891597960224316/576663433152357309/
>> 
>> 
>> our users are getting into this problem all the time, but I cannot reproduce 
>> putting job manager into that state. They can submit again when I kill it.
>> 
>> We haven't seen this, before our job submission software started submitting 
>> jobs with two-phase.
>> 
>> Cheers,
>> Yuriy

Reply via email to