Hi,

I have been working with Globus for several months now, and we have had
great success with using it to drive a Torque/PBS scheduled cluster.

Recently I have been trying to get our system working with SGE, where I have
found that GRAM never reports a failure, but always successful completion,
though a qacct -j on the job in question clearly reports non-zero exit
status.  After an initial wrong turn of trying to edit the poll() method in
sge.pm, I found that this method is never called, and a Google search
brought me to a previous message on this list:

http://www.mail-archive.com/[email protected]/msg03507.html

Where I found that Globus depends on the
$SGE_ROOT/$SGE_CELL/common/reporting file for its job exit status
information.

We do have SGE generating this, and we have verified that it correctly
reports non-zero exit status for things that GRAM reports as zero.

Would someone mind pointing me to the code that reads this file, so I can
understand what format it is expecting?

Any other pointers on how to address this issue would be much appreciated.

Thanks in advance.

--Brendan

Reply via email to