Hi Edgar, Could you send me your conf file? I'll try to reproduce it.
Maybe run with --mca btl_base_verbose 20 or something to see what the code that is parsing this field in the conf file is finding. Howard -----Original Message----- From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Edgar Gabriel Sent: Thursday, August 28, 2014 3:40 PM To: Open MPI Developers Subject: Re: [OMPI devel] segfault in openib component on trunk to add another piece of information that I just found, the segfault only occurs if I have a particular mca parameter set in my mca-params.conf file, namely btl_openib_receive_queues = S,12288,128,64,32:S,65536,128,64,3 Has the syntax for this parameter changed, or should/can I get rid of it? Thanks Edgar On 08/28/2014 04:19 PM, Edgar Gabriel wrote: > we are having recently problems running trunk with openib component > enabled on one of our clusters. The problem occurs right in the > initialization part, here is the stack right before the segfault: > > ---snip--- > (gdb) where > #0 mca_btl_openib_tune_endpoint (openib_btl=0x762a40, > endpoint=0x7d9660) at btl_openib.c:470 > #1 0x00007f1062f105c4 in mca_btl_openib_add_procs (btl=0x762a40, > nprocs=2, procs=0x759be0, peers=0x762440, reachable=0x7fff22dd16f0) at > btl_openib.c:1093 > #2 0x00007f106316102c in mca_bml_r2_add_procs (nprocs=2, > procs=0x759be0, reachable=0x7fff22dd16f0) at bml_r2.c:201 > #3 0x00007f10615c0dd5 in mca_pml_ob1_add_procs (procs=0x70dc00, > nprocs=2) at pml_ob1.c:334 > #4 0x00007f106823ed84 in ompi_mpi_init (argc=1, argv=0x7fff22dd1da8, > requested=0, provided=0x7fff22dd184c) at runtime/ompi_mpi_init.c:790 > #5 0x00007f1068273a2c in MPI_Init (argc=0x7fff22dd188c, > argv=0x7fff22dd1880) at init.c:84 > #6 0x00000000004008e7 in main (argc=1, argv=0x7fff22dd1da8) at > hello_world.c:13 > ---snip--- > > > in line 538 of the file containing the mca_btl_openib_tune_endpoint > routine, the strcmp operation fails, because recv_qps is a NULL pointer. > > > ---snip--- > > if(0 != strcmp(mca_btl_openib_component.receive_queues, recv_qps)) { > > ---snip--- > > Does anybody have an idea on what might be going wrong and how to > resolve it? Just to confirm, everything works perfectly with the 1.8 > series on that very same cluster > > Thanks > Edgar > _______________________________________________ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/08/15746.php _______________________________________________ devel mailing list de...@open-mpi.org Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel Link to this post: http://www.open-mpi.org/community/lists/devel/2014/08/15747.php