On Tue, 1 Jun 2010, Jeff Squyres wrote:

On May 31, 2010, at 5:10 AM, Sylvain Jeaugey wrote:

In my case, the error happens in :
   mca_btl_openib_add_procs()
     mca_btl_openib_size_queues()
       adjust_cq()
         ibv_create_cq_compat()
           ibv_create_cq()

Can you nail this down any further? If I modify adjust_cq() to always return OMPI_ERROR, I see the openib BTL fail over properly to the TCP BTL.
It must be because create_cq actually creates cqs. Try to apply this patch which makes create_cq_compat() *not* creates the cqs and return an error instead :
========================================================================
diff -r 13df81d1d862 ompi/mca/btl/openib/btl_openib.c
--- a/ompi/mca/btl/openib/btl_openib.c  Fri May 28 14:50:25 2010 +0200
+++ b/ompi/mca/btl/openib/btl_openib.c  Wed Jun 02 10:56:57 2010 +0200
@@ -146,6 +146,7 @@
         int cqe, void *cq_context, struct ibv_comp_channel *channel,
         int comp_vector)
 {
+    return OMPI_ERROR;
 #if OMPI_IBV_CREATE_CQ_ARGS == 3
     return ibv_create_cq(context, cqe, channel);
 #else
========================================================================

You should see MPI_Init complete nicely and your application segfault on the next MPI operation.

Sylvain

Reply via email to