Hi,

I'm installing GRAM5 (GT 5.0.0) on a CentOS 5 x86_64 system with Torque.

I got gatekeeper going and I can submit simple jobs all fine.

I've tried to switch to using the Scheduler event generator, but got
stuck on that:

I was trying to follow the instructions on
http://www.globus.org/toolkit/docs/5.0/5.0.0/execution/gram5/admin/#id2545820

* I've run
  /opt/globus/setup/globus/setup-seg-pbs.pl --path
/usr/spool/PBS/server_logs
* I've edited  $GLOBUS_LOCATION/etc/grid-services/jobmanager-pbs and
added "-seg-module pbs" to the list of arguments:

* I've tried running
$GLOBUS_LOCATION/sbin/globus-job-manager-event-generator -scheduler pbs
-background -pidfile /opt/globus/var/job-manager-seg-pbs.pid
but it failed with
   Error: pbs not configured

* After looking into the event-generator script, I've added the
following line to $GLOBUS_LOCATION/etc/globus-job-manager-seg.conf

pbs_log_path=/usr/spool/PBS/server_logs

Now, I could run the event-generator without any error.

However, when I then try running the job with globusrun:
* the job state does not progress
* the globus-job-manager process (running under the local mapped user
account) starts running wild, consuming more and more memory until
taking the machine down (or being killed)

In PS output, the process shows as:
> globus-job-manager -conf /opt/globus/etc/globus-job-manager.conf -type pbs 
> -seg-module pbs

There's not much output in /opt/globus/var/globus-gatekeeper.log even
though I'm running gatekeeper with -debug.

> TIME: Wed Mar 10 18:52:00 2010
>  PID: 11308 -- Notice: 6: globus-gatekeeper pid=11308 starting at Wed Mar 10 
> 18:52:00 2010
> 
> TIME: Wed Mar 10 18:52:00 2010
>  PID: 11308 -- Notice: 6: GRAM contact: 
> ng1.canterbury.ac.nz:2119:/C=NZ/O=BeSTGRID/OU=University of 
> Canterbury/CN=ng1.canterbury.ac.nz
> 
> TIME: Wed Mar 10 18:52:00 2010
>  PID: 11308 -- Notice: 0: GATEKEEPER_ACCT_FD=6 
> (/opt/globus/var/globus-gatekeeper.log)
> TIME: Wed Mar 10 18:52:31 2010
>  PID: 11353 -- Notice: 6: Got connection 132.181.39.11 at Wed Mar 10 18:52:31 
> 2010
> 
> TIME: Wed Mar 10 18:52:31 2010
>  PID: 11353 -- Notice: 5: Authenticated globus user: 
> /C=NZ/O=BeSTGRID/OU=University of Canterbury/CN=Vladimir Mencl
> TIME: Wed Mar 10 18:52:31 2010
>  PID: 11353 -- Notice: 0: GATEKEEPER_JM_ID 
> 2010-03-10.18:52:31.0000011308.0000000001 for /C=NZ/O=BeSTGRID/OU=University 
> of Canterbury/CN=Vladimir Mencl on 132.181.39.11
>  PID: 11353 -- PRIMA INFO ts=2010-03-10T18:52:31+12:00 
> event=org.osg.prima.authz.start DN="/C=NZ/O=BeSTGRID/OU=University of 
> Canterbury/CN=Vladimir Mencl" 
> Service_URL="https://nggums.canterbury.ac.nz:8443/gums/services/GUMSAuthorizationServicePort";
>  PID: 11353 -- PRIMA INFO ts=2010-03-10T18:52:31+12:00 
> event=org.osg.prima.authz.end status=0 decision=PERMIT 
> DN="/C=NZ/O=BeSTGRID/OU=University of Canterbury/CN=Vladimir Mencl" 
> Service_URL="https://nggums.canterbury.ac.nz:8443/gums/services/GUMSAuthorizationServicePort";
>  local_user=grid-bestgrid
> TIME: Wed Mar 10 18:52:31 2010
>  PID: 11353 -- Notice: 0: GRID_SECURITY_HTTP_BODY_FD=9
> TIME: Wed Mar 10 18:52:31 2010
>  PID: 11353 -- Notice: 5: Requested service: jobmanager 
> TIME: Wed Mar 10 18:52:31 2010
>  PID: 11353 -- Notice: 5: Authorized as local user: grid-bestgrid
> TIME: Wed Mar 10 18:52:31 2010
>  PID: 11353 -- Notice: 5: Authorized as local uid: 95008
> TIME: Wed Mar 10 18:52:31 2010
>  PID: 11353 -- Notice: 5:           and local gid: 95008
> TIME: Wed Mar 10 18:52:31 2010
>  PID: 11353 -- Notice: 0: executing /opt/globus/libexec/globus-job-manager
> TIME: Wed Mar 10 18:52:31 2010
>  PID: 11353 -- Notice: 0: GRID_SECURITY_CONTEXT_FD=12
> ts=2010-03-10T05:52:31Z id=11354 event=gram_gsi_get_subject.start level=TRACE 
> TIME: Wed Mar 10 18:52:31 2010
>  PID: 11353 -- Notice: 0: Child 11354 started
> JMA 2010/03/10 18:52:33 GATEKEEPER_JM_ID 
> 2010-03-10.18:52:31.0000011308.0000000001 for /C=NZ/O=BeSTGRID/OU=University 
> of Canterbury/CN=Vladimir Mencl on 132.181.39.11
> JMA 2010/03/10 18:52:33 GATEKEEPER_JM_ID 
> 2010-03-10.18:52:31.0000011308.0000000001 mapped to grid-bestgrid (95008, 
> 95008)
> JMA 2010/03/10 18:52:33 GATEKEEPER_JM_ID 
> 2010-03-10.18:52:31.0000011308.0000000001 has GRAM_SCRIPT_JOB_ID 
> 5065.ngcompute.canterbury.ac.nz manager type pbs


Any idea what I'm doing wrong?

Any help would be highly appreciated.


Cheers,
Vladimir

PS: The detailed notes on how I've setup the system are at
http://www.bestgrid.org/index.php/Setup_GRAM5_on_CentOS_5


-- 
Vladimir Mencl, Ph.D.
E-Research Services and Systems Consultant
BlueFern Supercomputing Services
University of Canterbury
Private Bag 4800
Christchurch 8140
New Zealand

http://www.bluefern.canterbury.ac.nz
mailto:[email protected]
Phone: +64 3 364 3012
Mobile: +64 21 997 352
Fax: +64 3 364 2332

Reply via email to