Running "$GLOBUS_LOCATION/sbin/gpt-build -nosrc <flavor>" for the
flavor you're trying to build should generate the required compiler-
detection and create the build-env script it is looking for.
Charles
On Jan 23, 2009, at 5:46 AM, Brendan MacLean wrote:
Hi Jeff,
I had found the code there (thanks for confirming it is a primary
source), and even located the problem in seg_sge_module.c as being
the hard-coded index 13 for the location of the exit-status, where
it is 14 on our system. Indeed, the column at 13 is zero in all the
cases I have been running, while 14 is 1 (or zero for success).
(The ability to configure and a little documentation for something
like this would be nice, of course.)
Also, for a "job_log::deleted" record, I will want a failure, rather
than the successful completion currently reported. I believe this
works correctly for PBS.
This may not be a complete solution, but these fixes will be a huge
improvement over what we have. After that we'll see how many
unreported errors we run into.
We have not yet built Globus on this machine. At least it looks
like my sysadmin used a binary install. When I run "configure" for
the globus_scheduler_event_generator_sge-1.1 package, I get a number
of errors about missing dependencies, and finally:
./configure: line 1442: /usr/local/gt/libexec/globus-build-env-.sh:
No such file or directory
Any suggestions on the shortest route to replacing our version of
libglobus_seg_sge_* with one built from an edited seg_sge_module.c
would be much appreciated. Otherwise, I am sure we can muddle
through this today.
Thanks again for the quick response.
--Brendan
On Thu, Jan 22, 2009 at 10:22 PM, Jeff Porter <[email protected]>
wrote:
Hi Brendan,
You can view that code in the seg_sge_module.c file of the
globus_scheduler_event_generator_sge-1.1 package at,
http://www.lesc.ic.ac.uk/projects/SGE-GT4.html
There are variations on this code but I don't know whether anyone
has touched the logic for decoding the reporting file information. I
just looked and it is checking the "acct" record in the reporting
file for errors. However, this doesn't seem to be very complete as
jobs can fail and finish without any updated "acct" record being
written.
Perhaps others have implemented or consider a more robust way to
check for errors? You can get it from the "accounting" file but
that isn't being parsed in this seg_sge_module.
- Jeff
Brendan MacLean wrote:
Hi,
I have been working with Globus for several months now, and we have
had great success with using it to drive a Torque/PBS scheduled
cluster.
Recently I have been trying to get our system working with SGE,
where I have found that GRAM never reports a failure, but always
successful completion, though a qacct -j on the job in question
clearly reports non-zero exit status. After an initial wrong turn
of trying to edit the poll() method in sge.pm <http://sge.pm>, I
found that this method is never called, and a Google search brought
me to a previous message on this list:
http://www.mail-archive.com/[email protected]/msg03507.html
Where I found that Globus depends on the $SGE_ROOT/$SGE_CELL/common/
reporting file for its job exit status information.
We do have SGE generating this, and we have verified that it
correctly reports non-zero exit status for things that GRAM reports
as zero.
Would someone mind pointing me to the code that reads this file, so
I can understand what format it is expecting?
Any other pointers on how to address this issue would be much
appreciated.
Thanks in advance.
--Brendan