Ralph, As I said this is NOT a cluster - it is a 4k-core shared memory machine. TORQUE is allocating cpus (time-shared mode, IIRC), not nodes. So, there is always exactly one line in $PBS_NODESFILE.
The system runs as 2 partitions of 2k-cores each. So, the contents odf$PBS_NODESFILE has exactly 2 possible values, each 1 line. The values of PBS_PPN and PBS_NCPUS both reflect the size of the allocation. At a minimum, shouldn't Open MPI be multiplying the lines in $PBS_NODESFILE by the value of $PBS_PPN? Additionally, when I try "mpirun -npernode 16 ./ring_c" I am still told there are not enough slots. Shouldn't that be working with 1 line is $PBS_NODESFILE? -Paul On Fri, Jan 31, 2014 at 2:47 PM, Ralph Castain <r...@open-mpi.org> wrote: > We read the nodes from the PBS_NODEFILE, Paul - can you pass that along? > > On Jan 31, 2014, at 2:33 PM, Paul Hargrove <phhargr...@lbl.gov> wrote: > > I am trying to test the trunk on an SGI UV (to validate Nathan's port of > btl:vader to SGI's variant of xpmem). > > At configure time, PBS's TM support was correctly located. > > My PBS batch script includes > #PBS -l ncpus=16 > because that is what this installation requires (not nodes, mppnodes, or > anything like that). > One is allocating cpus on a large shared-memory machine, not a set of > nodes in a cluster. > > However, this appears to be causing mpirun to think I have just 1 slot: > > + mpirun -np 2 ./ring_c > -------------------------------------------------------------------------- > There are not enough slots available in the system to satisfy the 2 slots > that were requested by the application: > ./ring_c > > Either request fewer slots for your application, or make more slots > available > for use. > -------------------------------------------------------------------------- > > In case they contain useful info, here are the PBS env vars in the job: > > PBS_HT_NCPUS=32 > PBS_VERSION=TORQUE-2.3.13 > PBS_JOBNAME=qs > PBS_ENVIRONMENT=PBS_BATCH > PBS_HOME=/var/spool/torque > > PBS_O_WORKDIR=/usr/users/6/hargrove/SCRATCH/OMPI/openmpi-trunk-linux-x86_64-uv-trunk/BLD/examples > PBS_PPN=16 > PBS_TASKNUM=1 > PBS_O_HOME=/usr/users/6/hargrove > PBS_MOMPORT=15003 > PBS_O_QUEUE=debug > PBS_O_LOGNAME=hargrove > PBS_O_LANG=en_US.UTF-8 > PBS_JOBCOOKIE=9EEF5DF75FA705A241FEF66EDFE01C5B > PBS_NODENUM=0 > PBS_O_SHELL=/usr/psc/shells/bash > PBS_SERVER=tg-login1.blacklight.psc.teragrid.org > PBS_JOBID=314827.tg-login1.blacklight.psc.teragrid.org > PBS_NCPUS=16 > PBS_O_HOST=tg-login1.blacklight.psc.teragrid.org > PBS_VNODENUM=0 > PBS_QUEUE=debug_r1 > PBS_O_MAIL=/var/mail/hargrove > PBS_NODEFILE=/var/spool/torque/aux// > 314827.tg-login1.blacklight.psc.teragrid.org > PBS_O_PATH=[...removed...] > > If any additional info is needed to help make mpirun "just work", please > let me know. > > However, at this point I am mostly interested in any work-arounds that > will let me run something other than a singleton on this system. > > -Paul > > -- > Paul H. Hargrove phhargr...@lbl.gov > Future Technologies Group > Computer and Data Sciences Department Tel: +1-510-495-2352 > Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel > -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group Computer and Data Sciences Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900