Hi,

user job manager gets into the state where the submission with globusrun hangs 
and job is never submitted

server logs say:
Sep 13 15:35:30 gram5 gridinfo[30804]: ts=2011-09-13T03:35:30.104424Z id=30804 
event=gram.job.start level=INFO gramid=/16145890501405029996/576663433152357309/
peer=130.216.189.203:57672
Sep 13 15:35:30 gram5 gridinfo[30804]: ts=2011-09-13T03:35:30.104582Z id=30804 
event=gram.add_request.end level=WARN 
gramid=/16145890501405029996/576663433152357309/ status=-130
reason="the job manager was sent a stop signal (job is still running)"
Sep 13 15:35:30 gram5 gridinfo[30804]: ts=2011-09-13T03:35:30.104885Z id=30804 
event=gram.job.end level=INFO gramid=/16145890501405029996/576663433152357309/ 
status=-130 msg="Request
start failed" reason="the job manager was sent a stop signal (job is still 
running)"

submission with globusrun hangs:

globusrun -batch     -r gram5.ceres.auckland.ac.nz 
'&(executable=echo)(arguments= 
hello)(job_type=single)(count=1)(hostCount=1)(vo="/nz/nesi")(maxWalltime=10)(directory=/home/smas036)'
globus_gram_client_callback_allow successful
GRAM Job submission successful
https://gram5.ceres.auckland.ac.nz:40398/16145891598704212781/576663433152357309/


submission with two-phase does not hang and results in:
globusrun  -batch     -r gram5.ceres.auckland.ac.nz 
'&(two_phase=5)(executable=echo)(arguments=
hello)(job_type=single)(count=1)(hostCount=1)(vo="/nz/nesi")(maxWalltime=10)(directory=/home/smas036)'
globus_gram_client_callback_allow successful
GRAM Job submission failed because the job contact string does not match any 
which the job manager is handling (error code 156)
https://gram5.ceres.auckland.ac.nz:40398/16145891597960224316/576663433152357309/


our users are getting into this problem all the time, but I cannot reproduce 
putting job manager into that state. They can submit again when I kill it.

We haven't seen this, before our job submission software started submitting 
jobs with two-phase.

Cheers,
Yuriy

Reply via email to