Are you sure that the Job IDs referenced in the SEG output (looks like
xxx.yyy.zzz) match the Job IDs that WS-GRAM thinks it has gotten back
from the perl jobmanagers? I've done one of these second-condor
jobmanagers before for OSG's ManagedFork jobmanager, and there was
some problem where the scripts were reporting xxx.0, but the SEG was
reporting on xxx.000.000. WS-GRAM won't realize that those are
supposed to be the same, so you can either modify the behavior of your
foo.pm or your SEG so they match up. If it's not obvious at your
current level of logging, bump the GRAM logging up to DEBUG in the
container-log4j.properties file.
Charles
On Feb 16, 2009, at 11:28 AM, Andre Charbonneau wrote:
Greetings,
I'm currently trying to deploy a new job manager and scheduler event
generator and I'm having some problems.
Basically, what I am trying to do is to have a second condor job
manager
and scheduler interface and SEG module, but with a different name
(foo).
To get started, I simply cloned the code from the existing condor job
manager, scheduler provider and SEG module and changed the names in
the
various files to refer to 'foo' instead of 'condor'.
So far, I'm able to submit my job and the job runs to completion. The
problem I'm having is that the globusrun-ws client does not seem to
get
any notifications, even though my SEG module seem to be working fine.
It simply waits forever after I submit my job. For example:
globusrun-ws -submit -s -Ft Foo -Jf creds.epr -Sf creds.epr -Tf
creds.epr -F ******* -f myjob.xml
Submitting job...Done.
Job ID: uuid:f4df4ba2-fc4d-11dd-8f66-00b0d0e1435d
Termination time: 02/17/2009 17:19 GMT
Current job state: Unsubmitted
I checked if my SEG module is running and it looks ok:
ps -ef |grep globus-scheduler-event-generator
globus 26288 26229 0 11:53 ? 00:00:00
/usr/local/globus/libexec/globus-scheduler-event-generator -s foo -t
1234802181
globus 26300 26229 0 11:53 ? 00:00:00
/usr/local/globus/libexec/globus-scheduler-event-generator -s fork -t
1234801181
globus 26744 26229 0 12:10 ? 00:00:00
/usr/local/globus/libexec/globus-scheduler-event-generator -s condor
-t
1234803725
globus 26748 2895 0 12:10 pts/0 00:00:00 grep
globus-scheduler-event-generator
And when I run it by hand, it looks like it is behaving OK too:
001;1234804765;098.000.000;1;0
001;1234804806;098.000.000;2;0
001;1234804823;098.000.000;8;0
I've compared my code with the one from the Condor job manager and I
can't find what I'm missing.
Anyone one else had similar issues when deploying their custom made
job
managers? Anything else than the SEG module that is required for the
job state notifications to be properly sent to the client?
(I'm using gt 4.0.8)
Thanks,
Andre