On Jan 31, 2014, at 3:13 PM, Paul Hargrove <phhargr...@lbl.gov> wrote:
> Ralph, > > As I said this is NOT a cluster - it is a 4k-core shared memory machine. I understood - that wasn't the nature of my question > TORQUE is allocating cpus (time-shared mode, IIRC), not nodes. > So, there is always exactly one line in $PBS_NODESFILE. Interesting - because that isn't the standard way Torque behaves. It is supposed to put one line/slot in the nodefile, each line containing the name of the node. Clearly, SGI has reconfigured Torque to do something different. > > The system runs as 2 partitions of 2k-cores each. > So, the contents odf$PBS_NODESFILE has exactly 2 possible values, each 1 line. > > The values of PBS_PPN and PBS_NCPUS both reflect the size of the allocation. > > At a minimum, shouldn't Open MPI be multiplying the lines in $PBS_NODESFILE > by the value of $PBS_PPN? No, as above, that isn't the way Torque generally behaves. It would appear that we need a "switch" here to handle SGI's modifications. Should be doable - just haven't had anyone using an SGI machine before :-) > > Additionally, when I try "mpirun -npernode 16 ./ring_c" I am still told there > are not enough slots. > Shouldn't that be working with 1 line is $PBS_NODESFILE? > > -Paul > > > > > On Fri, Jan 31, 2014 at 2:47 PM, Ralph Castain <r...@open-mpi.org> wrote: > We read the nodes from the PBS_NODEFILE, Paul - can you pass that along? > > On Jan 31, 2014, at 2:33 PM, Paul Hargrove <phhargr...@lbl.gov> wrote: > >> I am trying to test the trunk on an SGI UV (to validate Nathan's port of >> btl:vader to SGI's variant of xpmem). >> >> At configure time, PBS's TM support was correctly located. >> >> My PBS batch script includes >> #PBS -l ncpus=16 >> because that is what this installation requires (not nodes, mppnodes, or >> anything like that). >> One is allocating cpus on a large shared-memory machine, not a set of nodes >> in a cluster. >> >> However, this appears to be causing mpirun to think I have just 1 slot: >> >> + mpirun -np 2 ./ring_c >> -------------------------------------------------------------------------- >> There are not enough slots available in the system to satisfy the 2 slots >> that were requested by the application: >> ./ring_c >> >> Either request fewer slots for your application, or make more slots available >> for use. >> -------------------------------------------------------------------------- >> >> In case they contain useful info, here are the PBS env vars in the job: >> >> PBS_HT_NCPUS=32 >> PBS_VERSION=TORQUE-2.3.13 >> PBS_JOBNAME=qs >> PBS_ENVIRONMENT=PBS_BATCH >> PBS_HOME=/var/spool/torque >> PBS_O_WORKDIR=/usr/users/6/hargrove/SCRATCH/OMPI/openmpi-trunk-linux-x86_64-uv-trunk/BLD/examples >> PBS_PPN=16 >> PBS_TASKNUM=1 >> PBS_O_HOME=/usr/users/6/hargrove >> PBS_MOMPORT=15003 >> PBS_O_QUEUE=debug >> PBS_O_LOGNAME=hargrove >> PBS_O_LANG=en_US.UTF-8 >> PBS_JOBCOOKIE=9EEF5DF75FA705A241FEF66EDFE01C5B >> PBS_NODENUM=0 >> PBS_O_SHELL=/usr/psc/shells/bash >> PBS_SERVER=tg-login1.blacklight.psc.teragrid.org >> PBS_JOBID=314827.tg-login1.blacklight.psc.teragrid.org >> PBS_NCPUS=16 >> PBS_O_HOST=tg-login1.blacklight.psc.teragrid.org >> PBS_VNODENUM=0 >> PBS_QUEUE=debug_r1 >> PBS_O_MAIL=/var/mail/hargrove >> PBS_NODEFILE=/var/spool/torque/aux//314827.tg-login1.blacklight.psc.teragrid.org >> PBS_O_PATH=[...removed...] >> >> If any additional info is needed to help make mpirun "just work", please let >> me know. >> >> However, at this point I am mostly interested in any work-arounds that will >> let me run something other than a singleton on this system. >> >> -Paul >> >> -- >> Paul H. Hargrove phhargr...@lbl.gov >> Future Technologies Group >> Computer and Data Sciences Department Tel: +1-510-495-2352 >> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > > -- > Paul H. Hargrove phhargr...@lbl.gov > Future Technologies Group > Computer and Data Sciences Department Tel: +1-510-495-2352 > Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel