Hi Ralph,

I saw another hangup with openmpi-1.8 when I used more than 4 nodes
(having 8 cores each) under managed state by Torque. Although I'm not
sure you can reproduce it with SLURM, at leaset with Torque it can be
reproduced in this way:

[mishima@manage ~]$ qsub -I -l nodes=4:ppn=8
qsub: waiting for job 8726.manage.cluster to start
qsub: job 8726.manage.cluster ready

[mishima@node09 ~]$ mpirun -np 65 ~/mis/openmpi/demos/myprog
--------------------------------------------------------------------------
There are not enough slots available in the system to satisfy the 65 slots
that were requested by the application:
  /home/mishima/mis/openmpi/demos/myprog

Either request fewer slots for your application, or make more slots
available
for use.
--------------------------------------------------------------------------
<<< HANG HERE!! >>>
Abort is in progress...hit ctrl-c again within 5 seconds to forcibly
terminate

I found this behavior when I happened to input wrong procs. With less than
4
nodes or rsh - namely unmanaged state, it works. I'm afraid to say I have
no
idea how to resolve it. I hope you could fix the problem.

Tetsuya

Reply via email to