Hi Ralph, I saw another hangup with openmpi-1.8 when I used more than 4 nodes (having 8 cores each) under managed state by Torque. Although I'm not sure you can reproduce it with SLURM, at leaset with Torque it can be reproduced in this way:
[mishima@manage ~]$ qsub -I -l nodes=4:ppn=8 qsub: waiting for job 8726.manage.cluster to start qsub: job 8726.manage.cluster ready [mishima@node09 ~]$ mpirun -np 65 ~/mis/openmpi/demos/myprog -------------------------------------------------------------------------- There are not enough slots available in the system to satisfy the 65 slots that were requested by the application: /home/mishima/mis/openmpi/demos/myprog Either request fewer slots for your application, or make more slots available for use. -------------------------------------------------------------------------- <<< HANG HERE!! >>> Abort is in progress...hit ctrl-c again within 5 seconds to forcibly terminate I found this behavior when I happened to input wrong procs. With less than 4 nodes or rsh - namely unmanaged state, it works. I'm afraid to say I have no idea how to resolve it. I hope you could fix the problem. Tetsuya