On Jun 2, 2010, at 5:08 AM, Sylvain Jeaugey wrote:

> It must be because create_cq actually creates cqs. Try to apply this
> patch which makes create_cq_compat() *not* creates the cqs and return an
> error instead :
> ========================================================================
> diff -r 13df81d1d862 ompi/mca/btl/openib/btl_openib.c
> --- a/ompi/mca/btl/openib/btl_openib.c  Fri May 28 14:50:25 2010 +0200
> +++ b/ompi/mca/btl/openib/btl_openib.c  Wed Jun 02 10:56:57 2010 +0200
> @@ -146,6 +146,7 @@
>           int cqe, void *cq_context, struct ibv_comp_channel *channel,
>           int comp_vector)
>   {
> +    return OMPI_ERROR;
>   #if OMPI_IBV_CREATE_CQ_ARGS == 3
>       return ibv_create_cq(context, cqe, channel);
>   #else
> ========================================================================

Don't you mean return NULL?  This function is supposed to return a (struct 
ibv_cq *).

> You should see MPI_Init complete nicely and your application segfault on
> the next MPI operation.

That wouldn't surprise me if you return OMPI_ERROR here, since it's expecting a 
pointer return value (OMPI_ERROR != NULL, so the error check from 
ibv_create_cq_compat() won't detect the problem properly).  

Sidenote: why did we call it ibv_create_cq_compat()?  That seems like a 
namespace violation, and is quite confusing.  :-\

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/


Reply via email to