On Tue, 1 Jun 2010, Jeff Squyres wrote:
On May 31, 2010, at 5:10 AM, Sylvain Jeaugey wrote:
In my case, the error happens in :
mca_btl_openib_add_procs()
mca_btl_openib_size_queues()
adjust_cq()
ibv_create_cq_compat()
ibv_create_cq()
Can you nail this down any further? If I modify adjust_cq() to always
return OMPI_ERROR, I see the openib BTL fail over properly to the TCP
BTL.
It must be because create_cq actually creates cqs. Try to apply this
patch which makes create_cq_compat() *not* creates the cqs and return an
error instead :
========================================================================
diff -r 13df81d1d862 ompi/mca/btl/openib/btl_openib.c
--- a/ompi/mca/btl/openib/btl_openib.c Fri May 28 14:50:25 2010 +0200
+++ b/ompi/mca/btl/openib/btl_openib.c Wed Jun 02 10:56:57 2010 +0200
@@ -146,6 +146,7 @@
int cqe, void *cq_context, struct ibv_comp_channel *channel,
int comp_vector)
{
+ return OMPI_ERROR;
#if OMPI_IBV_CREATE_CQ_ARGS == 3
return ibv_create_cq(context, cqe, channel);
#else
========================================================================
You should see MPI_Init complete nicely and your application segfault on
the next MPI operation.
Sylvain