Hi, I'm installing GRAM5 (GT 5.0.0) on a CentOS 5 x86_64 system with Torque.
I got gatekeeper going and I can submit simple jobs all fine. I've tried to switch to using the Scheduler event generator, but got stuck on that: I was trying to follow the instructions on http://www.globus.org/toolkit/docs/5.0/5.0.0/execution/gram5/admin/#id2545820 * I've run /opt/globus/setup/globus/setup-seg-pbs.pl --path /usr/spool/PBS/server_logs * I've edited $GLOBUS_LOCATION/etc/grid-services/jobmanager-pbs and added "-seg-module pbs" to the list of arguments: * I've tried running $GLOBUS_LOCATION/sbin/globus-job-manager-event-generator -scheduler pbs -background -pidfile /opt/globus/var/job-manager-seg-pbs.pid but it failed with Error: pbs not configured * After looking into the event-generator script, I've added the following line to $GLOBUS_LOCATION/etc/globus-job-manager-seg.conf pbs_log_path=/usr/spool/PBS/server_logs Now, I could run the event-generator without any error. However, when I then try running the job with globusrun: * the job state does not progress * the globus-job-manager process (running under the local mapped user account) starts running wild, consuming more and more memory until taking the machine down (or being killed) In PS output, the process shows as: > globus-job-manager -conf /opt/globus/etc/globus-job-manager.conf -type pbs > -seg-module pbs There's not much output in /opt/globus/var/globus-gatekeeper.log even though I'm running gatekeeper with -debug. > TIME: Wed Mar 10 18:52:00 2010 > PID: 11308 -- Notice: 6: globus-gatekeeper pid=11308 starting at Wed Mar 10 > 18:52:00 2010 > > TIME: Wed Mar 10 18:52:00 2010 > PID: 11308 -- Notice: 6: GRAM contact: > ng1.canterbury.ac.nz:2119:/C=NZ/O=BeSTGRID/OU=University of > Canterbury/CN=ng1.canterbury.ac.nz > > TIME: Wed Mar 10 18:52:00 2010 > PID: 11308 -- Notice: 0: GATEKEEPER_ACCT_FD=6 > (/opt/globus/var/globus-gatekeeper.log) > TIME: Wed Mar 10 18:52:31 2010 > PID: 11353 -- Notice: 6: Got connection 132.181.39.11 at Wed Mar 10 18:52:31 > 2010 > > TIME: Wed Mar 10 18:52:31 2010 > PID: 11353 -- Notice: 5: Authenticated globus user: > /C=NZ/O=BeSTGRID/OU=University of Canterbury/CN=Vladimir Mencl > TIME: Wed Mar 10 18:52:31 2010 > PID: 11353 -- Notice: 0: GATEKEEPER_JM_ID > 2010-03-10.18:52:31.0000011308.0000000001 for /C=NZ/O=BeSTGRID/OU=University > of Canterbury/CN=Vladimir Mencl on 132.181.39.11 > PID: 11353 -- PRIMA INFO ts=2010-03-10T18:52:31+12:00 > event=org.osg.prima.authz.start DN="/C=NZ/O=BeSTGRID/OU=University of > Canterbury/CN=Vladimir Mencl" > Service_URL="https://nggums.canterbury.ac.nz:8443/gums/services/GUMSAuthorizationServicePort" > PID: 11353 -- PRIMA INFO ts=2010-03-10T18:52:31+12:00 > event=org.osg.prima.authz.end status=0 decision=PERMIT > DN="/C=NZ/O=BeSTGRID/OU=University of Canterbury/CN=Vladimir Mencl" > Service_URL="https://nggums.canterbury.ac.nz:8443/gums/services/GUMSAuthorizationServicePort" > local_user=grid-bestgrid > TIME: Wed Mar 10 18:52:31 2010 > PID: 11353 -- Notice: 0: GRID_SECURITY_HTTP_BODY_FD=9 > TIME: Wed Mar 10 18:52:31 2010 > PID: 11353 -- Notice: 5: Requested service: jobmanager > TIME: Wed Mar 10 18:52:31 2010 > PID: 11353 -- Notice: 5: Authorized as local user: grid-bestgrid > TIME: Wed Mar 10 18:52:31 2010 > PID: 11353 -- Notice: 5: Authorized as local uid: 95008 > TIME: Wed Mar 10 18:52:31 2010 > PID: 11353 -- Notice: 5: and local gid: 95008 > TIME: Wed Mar 10 18:52:31 2010 > PID: 11353 -- Notice: 0: executing /opt/globus/libexec/globus-job-manager > TIME: Wed Mar 10 18:52:31 2010 > PID: 11353 -- Notice: 0: GRID_SECURITY_CONTEXT_FD=12 > ts=2010-03-10T05:52:31Z id=11354 event=gram_gsi_get_subject.start level=TRACE > TIME: Wed Mar 10 18:52:31 2010 > PID: 11353 -- Notice: 0: Child 11354 started > JMA 2010/03/10 18:52:33 GATEKEEPER_JM_ID > 2010-03-10.18:52:31.0000011308.0000000001 for /C=NZ/O=BeSTGRID/OU=University > of Canterbury/CN=Vladimir Mencl on 132.181.39.11 > JMA 2010/03/10 18:52:33 GATEKEEPER_JM_ID > 2010-03-10.18:52:31.0000011308.0000000001 mapped to grid-bestgrid (95008, > 95008) > JMA 2010/03/10 18:52:33 GATEKEEPER_JM_ID > 2010-03-10.18:52:31.0000011308.0000000001 has GRAM_SCRIPT_JOB_ID > 5065.ngcompute.canterbury.ac.nz manager type pbs Any idea what I'm doing wrong? Any help would be highly appreciated. Cheers, Vladimir PS: The detailed notes on how I've setup the system are at http://www.bestgrid.org/index.php/Setup_GRAM5_on_CentOS_5 -- Vladimir Mencl, Ph.D. E-Research Services and Systems Consultant BlueFern Supercomputing Services University of Canterbury Private Bag 4800 Christchurch 8140 New Zealand http://www.bluefern.canterbury.ac.nz mailto:[email protected] Phone: +64 3 364 3012 Mobile: +64 21 997 352 Fax: +64 3 364 2332
