Brian,
The ability to control the number of available QPs will vary by
vendor. Unless things have changed in recent years, Mellanox's firmware
tools allow one the modify the limit but at the inconvenience of
reburning the firmware. I know of no other way and know nothing about
other vendors.
-Paul
On 1/27/2011 2:56 PM, Barrett, Brian W wrote:
All -
On one of our clusters, we're seeing the following on one of our applications,
I believe using Open MPI 1.4.3:
[xxx:27545] *** An error occurred in MPI_Scatterv
[xxx:27545] *** on communicator MPI COMMUNICATOR 5 DUP FROM 4
[xxx:27545] *** MPI_ERR_OTHER: known error not in list
[xxx:27545] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
[xxx][[31806,1],0][connect/btl_openib_connect_oob.c:857:qp_create_one] error
creating qp errno says Resource temporarily unavailable
--------------------------------------------------------------------------
mpirun has exited due to process rank 0 with PID 27545 on
node rs1891 exiting without calling "finalize". This may
have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------
The problem goes away if we modify the eager protocol msg sizes so that there
are only two QPs necessary instead of the default 4. Is there a way to bump up
the number of QPs that can be created on a node, assuming the issue is just
running out of available QPs? If not, any other thoughts on working around the
problem?
Thanks,
Brian
--
Brian W. Barrett
Dept. 1423: Scalable System Software
Sandia National Laboratories
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
--
Paul H. Hargrove phhargr...@lbl.gov
Future Technologies Group
HPC Research Department Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900