I don't have any IB nodes, but I'm interested to see how this happens. What I 
would like to understand here is  how do we get back in the OpenIB code if the 
add_procs failed for the BTL ...

  george.

On Jun 2, 2010, at 05:08 , Sylvain Jeaugey wrote:

> On Tue, 1 Jun 2010, Jeff Squyres wrote:
> 
>> On May 31, 2010, at 5:10 AM, Sylvain Jeaugey wrote:
>> 
>>> In my case, the error happens in :
>>>   mca_btl_openib_add_procs()
>>>     mca_btl_openib_size_queues()
>>>       adjust_cq()
>>>         ibv_create_cq_compat()
>>>           ibv_create_cq()
>> 
>> Can you nail this down any further?  If I modify adjust_cq() to always 
>> return OMPI_ERROR, I see the openib BTL fail over properly to the TCP BTL.
> It must be because create_cq actually creates cqs. Try to apply this patch 
> which makes create_cq_compat() *not* creates the cqs and return an error 
> instead :
> ========================================================================
> diff -r 13df81d1d862 ompi/mca/btl/openib/btl_openib.c
> --- a/ompi/mca/btl/openib/btl_openib.c  Fri May 28 14:50:25 2010 +0200
> +++ b/ompi/mca/btl/openib/btl_openib.c  Wed Jun 02 10:56:57 2010 +0200
> @@ -146,6 +146,7 @@
>         int cqe, void *cq_context, struct ibv_comp_channel *channel,
>         int comp_vector)
> {
> +    return OMPI_ERROR;
> #if OMPI_IBV_CREATE_CQ_ARGS == 3
>     return ibv_create_cq(context, cqe, channel);
> #else
> ========================================================================
> 
> You should see MPI_Init complete nicely and your application segfault on the 
> next MPI operation.
> 
> Sylvain
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


Reply via email to