Brian,

The ability to control the number of available QPs will vary by vendor. Unless things have changed in recent years, Mellanox's firmware tools allow one the modify the limit but at the inconvenience of reburning the firmware. I know of no other way and know nothing about other vendors.

-Paul

On 1/27/2011 2:56 PM, Barrett, Brian W wrote:
All -

On one of our clusters, we're seeing the following on one of our applications, 
I believe using Open MPI 1.4.3:

[xxx:27545] *** An error occurred in MPI_Scatterv
[xxx:27545] *** on communicator MPI COMMUNICATOR 5 DUP FROM 4
[xxx:27545] *** MPI_ERR_OTHER: known error not in list
[xxx:27545] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
[xxx][[31806,1],0][connect/btl_openib_connect_oob.c:857:qp_create_one] error 
creating qp errno says Resource temporarily unavailable
--------------------------------------------------------------------------
mpirun has exited due to process rank 0 with PID 27545 on
node rs1891 exiting without calling "finalize". This may
have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------


The problem goes away if we modify the eager protocol msg sizes so that there 
are only two QPs necessary instead of the default 4.  Is there a way to bump up 
the number of QPs that can be created on a node, assuming the issue is just 
running out of available QPs?  If not, any other thoughts on working around the 
problem?

Thanks,

Brian

--
   Brian W. Barrett
   Dept. 1423: Scalable System Software
   Sandia National Laboratories





_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

--
Paul H. Hargrove                          phhargr...@lbl.gov
Future Technologies Group
HPC Research Department                   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900

Reply via email to