On 03/11/2010 01:33 AM, Joseph Bester wrote:
>> > pbs_log_path=/usr/spool/PBS/server_logs
>> > 
>> > Now, I could run the event-generator without any error.
> Do events for jobs started outside of globus show up in the SEG log file when 
> you run the SEG?

Hi Joe,

Thanks for the reply.

I've now had a fresh read through the documentation and I've realized
I'd skipped one crucial step in the SEG setup:

cd $GLOBUS_LOCATION/setup/globus; ./setup-seg-job-manager.pl

This is where I was breaking the setup: I thought I had hit a bug in the
setup scripts when globus-job-manager-event-generator was failing with
"Error: pbs not configured" and I had manually edited
globus-job-manager-seg.conf and added
"pbs_log_path=/usr/spool/PBS/server_logs" - while the correct line,
added by the setup-seg-job-manager.pl script, is
  pbs_log_path=/opt/globus/var/globus-job-manager-seg-pbs

Now.... it still doesn't work.  I don't get any events at all.  At least
now, when I try running a job with SEG enabled, the job manager does not
go on a memory hungry rampage.  I guess what I did wrong was to tell
jobmanager to look for LRM-independent logs in the PBS server_logs
directory - which is what was turning it so mad.  Uh oh.


But, I'm still stuck.  The event-generator is not processing the PBS
logs at all.  I run it with:

$GLOBUS_LOCATION/sbin/globus-job-manager-event-generator -scheduler pbs
-pidfile /opt/globus/var/job-manager-seg-pbs.pid

and even with strace and lsof, I can't see it doing any activity at all
- and it's not opening the PBS logs at all.  I can see it does run the C
binary in the background:
/opt/globus/libexec/globus-scheduler-event-generator -s pbs -t 1

But even when I run the event generator with:

export SEG_PBS_DEBUG=255 GLOBUS_ERROR_VERBOSE=1 GLOBUS_ERROR_OUTPUT=1
$GLOBUS_LOCATION/sbin/globus-job-manager-event-generator -scheduler pbs
-pidfile /opt/globus/var/job-manager-seg-pbs.pid

I only get:

[INFO] Enter globus_l_pbs_increase_buffer
[INFO] Exit globus_l_pbs_increase_buffer

and the output stops there.   Any idea what's wrong?

I do have PBS logs in /usr/spool/PBS/server_logs and my globus-pbs.conf
points there:

log_path=/usr/spool/PBS/server_logs


Any help would be highly appreciated.

Cheers,
Vladimir


> 
>> > However, when I then try running the job with globusrun:
>> > * the job state does not progress
>> > * the globus-job-manager process (running under the local mapped user
>> > account) starts running wild, consuming more and more memory until
>> > taking the machine down (or being killed)
>> > 
>> > In PS output, the process shows as:
>>> >> globus-job-manager -conf /opt/globus/etc/globus-job-manager.conf -type 
>>> >> pbs -seg-module pbs
>> > 
>> > There's not much output in /opt/globus/var/globus-gatekeeper.log even
>> > though I'm running gatekeeper with -debug.
> If there any information in the Job Manager Log? That file is going to be in 
> $HOME/gram_YYYMMDD.log by default.

Not many - only a few lines like this one:

ts=2010-03-12T02:39:49.693905Z id=15285 event=gram.query.end level=ERROR
status=-156 uri=/16073769478449478736/123149967014504105/ msg="Unable to
find job for URI" reason="the job contact string does
not match any which the job manager is handling"

> 
> The only time I've seen the huge memory use is when there are many job state 
> files in $GLOBUS_LOCATION/tmp/gram_job_state for the user due to a bug in how 
> the job manager restart code works. Is that the case for you?

I do have a few files there, but they do get pruned - I've seen
/opt/globus/tmp/gram_job_state having 4 files, then going empty, then
having ~ 18....


-- 
Vladimir Mencl, Ph.D.
E-Research Services and Systems Consultant
BlueFern Supercomputing Services
University of Canterbury
Private Bag 4800
Christchurch 8140
New Zealand

http://www.bluefern.canterbury.ac.nz
mailto:[email protected]
Phone: +64 3 364 3012
Mobile: +64 21 997 352
Fax: +64 3 364 2332

Reply via email to