I tracked it down - not Torque specific, but impacts all managed environments. Will fix
On Apr 1, 2014, at 2:23 AM, tmish...@jcity.maeda.co.jp wrote: > > Hi Ralph, > > I saw another hangup with openmpi-1.8 when I used more than 4 nodes > (having 8 cores each) under managed state by Torque. Although I'm not > sure you can reproduce it with SLURM, at leaset with Torque it can be > reproduced in this way: > > [mishima@manage ~]$ qsub -I -l nodes=4:ppn=8 > qsub: waiting for job 8726.manage.cluster to start > qsub: job 8726.manage.cluster ready > > [mishima@node09 ~]$ mpirun -np 65 ~/mis/openmpi/demos/myprog > -------------------------------------------------------------------------- > There are not enough slots available in the system to satisfy the 65 slots > that were requested by the application: > /home/mishima/mis/openmpi/demos/myprog > > Either request fewer slots for your application, or make more slots > available > for use. > -------------------------------------------------------------------------- > <<< HANG HERE!! >>> > Abort is in progress...hit ctrl-c again within 5 seconds to forcibly > terminate > > I found this behavior when I happened to input wrong procs. With less than > 4 > nodes or rsh - namely unmanaged state, it works. I'm afraid to say I have > no > idea how to resolve it. I hope you could fix the problem. > > Tetsuya > > _______________________________________________ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Searchable archives: > http://www.open-mpi.org/community/lists/devel/2014/04/index.php