Good point Paul.
I love XRC :-)
You may try to switch default configuration to XRC.
--mca btl_openib_receive_queues
X,128,256,192,128:X,2048,256,128,32:X,12288,256,128,32:X,65536,256,128,32
If XRC is not supported on your platform, ompi should report some nice message.
BTW, on multi core system XRC should show better performance.
Pavel (Pasha) Shamis
---
Application Performance Tools Group
Computer Science and Math Division
Oak Ridge National Laboratory
On Jan 27, 2011, at 8:19 PM, Paul H. Hargrove wrote:
> Brian,
>
> As Pasha said:
>> The maximum amount of supported qps you may see in ibv_devinfo.
>
> However you'll probably need "-v":
>
> {hargrove@cvrsvc05 ~}$ ibv_devinfo | grep max_qp:
> {hargrove@cvrsvc05 ~}$ ibv_devinfo -v | grep max_qp:
> max_qp: 261056
>
> If you really are running out of QPs due to the "fattness" of the node,
> then you should definitely look at enabling XRC if your HCA and
> libibverbs version supports it. ibv_devinfo can query the HCA capability:
>
> {hargrove@cvrsvc05 ~}$ ibv_devinfo -v | grep port_cap_flags:
> port_cap_flags: 0x02510868
>
> and look for bit 0x00100000 ( == 1<<20).
>
> -Paul
>
>
>
> On 1/27/2011 5:09 PM, Barrett, Brian W wrote:
>> Pasha -
>>
>> Is there a way to tell which of the two happened or to check the number of
>> QPs available per node? The app likely does talk to a large number of peers
>> from each process, and the nodes are fairly "fat" - it's quad socket, quad
>> core and they are running 16 MPI ranks for each node.
>>
>> Brian
>>
>> On Jan 27, 2011, at 6:17 PM, Shamis, Pavel wrote:
>>
>>> Unfortunately verbose error reports are not so friendly...anyway , I may
>>> think about 2 issues:
>>>
>>> 1. You trying to open open too much QPs. By default ib devices support
>>> fairly large amount of QPs and it is quite hard to push it to this corner.
>>> But If your job is really huge it may be the case. Or for example, if you
>>> share the compute nodes with some other processes that create a lot of qps.
>>> The maximum amount of supported qps you may see in ibv_devinfo.
>>>
>>> 2. The memory limit for registered memory is too low, as result driver
>>> fails allocate and register memory for QP. This scenario is most common.
>>> Just happened to me recently, system folks pushed some crap into
>>> limits.conf.
>>>
>>> Regards,
>>>
>>> Pavel (Pasha) Shamis
>>> ---
>>> Application Performance Tools Group
>>> Computer Science and Math Division
>>> Oak Ridge National Laboratory
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Jan 27, 2011, at 5:56 PM, Barrett, Brian W wrote:
>>>
>>>> All -
>>>>
>>>> On one of our clusters, we're seeing the following on one of our
>>>> applications, I believe using Open MPI 1.4.3:
>>>>
>>>> [xxx:27545] *** An error occurred in MPI_Scatterv
>>>> [xxx:27545] *** on communicator MPI COMMUNICATOR 5 DUP FROM 4
>>>> [xxx:27545] *** MPI_ERR_OTHER: known error not in list
>>>> [xxx:27545] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
>>>> [xxx][[31806,1],0][connect/btl_openib_connect_oob.c:857:qp_create_one]
>>>> error creating qp errno says Resource temporarily unavailable
>>>> --------------------------------------------------------------------------
>>>> mpirun has exited due to process rank 0 with PID 27545 on
>>>> node rs1891 exiting without calling "finalize". This may
>>>> have caused other processes in the application to be
>>>> terminated by signals sent by mpirun (as reported here).
>>>> --------------------------------------------------------------------------
>>>>
>>>>
>>>> The problem goes away if we modify the eager protocol msg sizes so that
>>>> there are only two QPs necessary instead of the default 4. Is there a way
>>>> to bump up the number of QPs that can be created on a node, assuming the
>>>> issue is just running out of available QPs? If not, any other thoughts on
>>>> working around the problem?
>>>>
>>>> Thanks,
>>>>
>>>> Brian
>>>>
>>>> --
>>>> Brian W. Barrett
>>>> Dept. 1423: Scalable System Software
>>>> Sandia National Laboratories
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> devel mailing list
>>>> [email protected]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>>> _______________________________________________
>>> devel mailing list
>>> [email protected]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>> --
>> Brian W. Barrett
>> Dept. 1423: Scalable System Software
>> Sandia National Laboratories
>>
>>
>>
>>
>>
>> _______________________________________________
>> devel mailing list
>> [email protected]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
> --
> Paul H. Hargrove [email protected]
> Future Technologies Group
> HPC Research Department Tel: +1-510-495-2352
> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
>
> _______________________________________________
> devel mailing list
> [email protected]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel