On 03/11/2010 01:33 AM, Joseph Bester wrote: >> > pbs_log_path=/usr/spool/PBS/server_logs >> > >> > Now, I could run the event-generator without any error. > Do events for jobs started outside of globus show up in the SEG log file when > you run the SEG?
Hi Joe, Thanks for the reply. I've now had a fresh read through the documentation and I've realized I'd skipped one crucial step in the SEG setup: cd $GLOBUS_LOCATION/setup/globus; ./setup-seg-job-manager.pl This is where I was breaking the setup: I thought I had hit a bug in the setup scripts when globus-job-manager-event-generator was failing with "Error: pbs not configured" and I had manually edited globus-job-manager-seg.conf and added "pbs_log_path=/usr/spool/PBS/server_logs" - while the correct line, added by the setup-seg-job-manager.pl script, is pbs_log_path=/opt/globus/var/globus-job-manager-seg-pbs Now.... it still doesn't work. I don't get any events at all. At least now, when I try running a job with SEG enabled, the job manager does not go on a memory hungry rampage. I guess what I did wrong was to tell jobmanager to look for LRM-independent logs in the PBS server_logs directory - which is what was turning it so mad. Uh oh. But, I'm still stuck. The event-generator is not processing the PBS logs at all. I run it with: $GLOBUS_LOCATION/sbin/globus-job-manager-event-generator -scheduler pbs -pidfile /opt/globus/var/job-manager-seg-pbs.pid and even with strace and lsof, I can't see it doing any activity at all - and it's not opening the PBS logs at all. I can see it does run the C binary in the background: /opt/globus/libexec/globus-scheduler-event-generator -s pbs -t 1 But even when I run the event generator with: export SEG_PBS_DEBUG=255 GLOBUS_ERROR_VERBOSE=1 GLOBUS_ERROR_OUTPUT=1 $GLOBUS_LOCATION/sbin/globus-job-manager-event-generator -scheduler pbs -pidfile /opt/globus/var/job-manager-seg-pbs.pid I only get: [INFO] Enter globus_l_pbs_increase_buffer [INFO] Exit globus_l_pbs_increase_buffer and the output stops there. Any idea what's wrong? I do have PBS logs in /usr/spool/PBS/server_logs and my globus-pbs.conf points there: log_path=/usr/spool/PBS/server_logs Any help would be highly appreciated. Cheers, Vladimir > >> > However, when I then try running the job with globusrun: >> > * the job state does not progress >> > * the globus-job-manager process (running under the local mapped user >> > account) starts running wild, consuming more and more memory until >> > taking the machine down (or being killed) >> > >> > In PS output, the process shows as: >>> >> globus-job-manager -conf /opt/globus/etc/globus-job-manager.conf -type >>> >> pbs -seg-module pbs >> > >> > There's not much output in /opt/globus/var/globus-gatekeeper.log even >> > though I'm running gatekeeper with -debug. > If there any information in the Job Manager Log? That file is going to be in > $HOME/gram_YYYMMDD.log by default. Not many - only a few lines like this one: ts=2010-03-12T02:39:49.693905Z id=15285 event=gram.query.end level=ERROR status=-156 uri=/16073769478449478736/123149967014504105/ msg="Unable to find job for URI" reason="the job contact string does not match any which the job manager is handling" > > The only time I've seen the huge memory use is when there are many job state > files in $GLOBUS_LOCATION/tmp/gram_job_state for the user due to a bug in how > the job manager restart code works. Is that the case for you? I do have a few files there, but they do get pruned - I've seen /opt/globus/tmp/gram_job_state having 4 files, then going empty, then having ~ 18.... -- Vladimir Mencl, Ph.D. E-Research Services and Systems Consultant BlueFern Supercomputing Services University of Canterbury Private Bag 4800 Christchurch 8140 New Zealand http://www.bluefern.canterbury.ac.nz mailto:[email protected] Phone: +64 3 364 3012 Mobile: +64 21 997 352 Fax: +64 3 364 2332
