Hi, I have been working with Globus for several months now, and we have had great success with using it to drive a Torque/PBS scheduled cluster.
Recently I have been trying to get our system working with SGE, where I have found that GRAM never reports a failure, but always successful completion, though a qacct -j on the job in question clearly reports non-zero exit status. After an initial wrong turn of trying to edit the poll() method in sge.pm, I found that this method is never called, and a Google search brought me to a previous message on this list: http://www.mail-archive.com/[email protected]/msg03507.html Where I found that Globus depends on the $SGE_ROOT/$SGE_CELL/common/reporting file for its job exit status information. We do have SGE generating this, and we have verified that it correctly reports non-zero exit status for things that GRAM reports as zero. Would someone mind pointing me to the code that reads this file, so I can understand what format it is expecting? Any other pointers on how to address this issue would be much appreciated. Thanks in advance. --Brendan
