Re: [OMPI devel] openmpi-1.8 - hangup using more than 4 nodes under managed state by Torque

Ralph Castain Tue, 1 Apr 2014 10:54:43 -0400 (EDT)

I tracked it down - not Torque specific, but impacts all managed environments. 
Will fix



On Apr 1, 2014, at 2:23 AM, [email protected] wrote:

> 
> Hi Ralph,
> 
> I saw another hangup with openmpi-1.8 when I used more than 4 nodes
> (having 8 cores each) under managed state by Torque. Although I'm not
> sure you can reproduce it with SLURM, at leaset with Torque it can be
> reproduced in this way:
> 
> [mishima@manage ~]$ qsub -I -l nodes=4:ppn=8
> qsub: waiting for job 8726.manage.cluster to start
> qsub: job 8726.manage.cluster ready
> 
> [mishima@node09 ~]$ mpirun -np 65 ~/mis/openmpi/demos/myprog
> --------------------------------------------------------------------------
> There are not enough slots available in the system to satisfy the 65 slots
> that were requested by the application:
>  /home/mishima/mis/openmpi/demos/myprog
> 
> Either request fewer slots for your application, or make more slots
> available
> for use.
> --------------------------------------------------------------------------
> <<< HANG HERE!! >>>
> Abort is in progress...hit ctrl-c again within 5 seconds to forcibly
> terminate
> 
> I found this behavior when I happened to input wrong procs. With less than
> 4
> nodes or rsh - namely unmanaged state, it works. I'm afraid to say I have
> no
> idea how to resolve it. I hope you could fix the problem.
> 
> Tetsuya
> 
> _______________________________________________
> devel mailing list
> [email protected]
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Searchable archives: 
> http://www.open-mpi.org/community/lists/devel/2014/04/index.php

Re: [OMPI devel] openmpi-1.8 - hangup using more than 4 nodes under managed state by Torque

Reply via email to