I suspect the problem is here: /** + * MOSIX BTL component. + */ +struct mca_btl_base_component_t { + mca_btl_base_component_2_0_0_t super; /**< base BTL component */ + mca_btl_mosix_module_t mosix_module; /**< local module */ +}; +typedef struct mca_btl_base_component_t mca_btl_mosix_component_t; + +OMPI_MODULE_DECLSPEC extern mca_btl_mosix_component_t mca_btl_mosix_component; +
You redefined the mca_btl_base_component_t struct. What we usually do is define a new struct: struct mca_btl_mosix_component_t { mca_btl_base_component_t super; /**< base BTL component */ mca_btl_mosix_module_t mosix_module; /**< local module */ }; typedef struct mca_btl_mosix_component_t mca_btl_mosix_component_t; You can then overload that component with your additional info, leaving the base component to contain the required minimal elements. On Apr 1, 2012, at 1:59 AM, Alex Margolin wrote: > I traced the problem to the BML component: > Index: ompi/mca/bml/r2/bml_r2.c > =================================================================== > --- ompi/mca/bml/r2/bml_r2.c (revision 26191) > +++ ompi/mca/bml/r2/bml_r2.c (working copy) > @@ -105,6 +105,8 @@ > } > } > if (NULL == btl_names_argv || NULL == btl_names_argv[i]) { > + printf("\n\nR1: %p\n\n", > btl->btl_component->btl_version.mca_component_name); > + printf("\n\nR2: %s\n\n", > btl->btl_component->btl_version.mca_component_name); > opal_argv_append_nosize(&btl_names_argv, > > btl->btl_component->btl_version.mca_component_name); > } > > I Get (white-spaces removed) for normal run: > R1: 0x7f820e3c31d8 > R2: self > R1: 0x7f820e13c598 > R2: tcp > ... and for my module: > R1: 0x38 > - and then the segmentation fault. > I guess it has something to do with the way I initialize my component - I'll > resume debugging after lunch. > > Alex > > On 03/31/2012 07:04 PM, Alex Margolin wrote: >> >> P.S. I get the following Error - I'm pretty sure my BTL is to blame here: >> >> alex@singularity:~/huji/benchmarks/simple$ mpirun -mca btl_base_verbose 100 >> -mca btl self,mosix hello >> [singularity:10838] mca: base: component_find: unable to open >> /usr/local/lib/openmpi/mca_mpool_sm: libmca_common_sm.so.0: cannot open >> shared object file: No such file or directory (ignored) >> [singularity:10838] mca: base: components_open: Looking for btl components >> [singularity:10838] mca: base: components_open: opening btl components >> [singularity:10838] mca: base: components_open: found loaded component mosix >> [singularity:10838] mca: base: components_open: component mosix register >> function successful >> [singularity:10838] mca: base: components_open: component mosix open >> function successful >> [singularity:10838] mca: base: components_open: found loaded component self >> [singularity:10838] mca: base: components_open: component self has no >> register function >> [singularity:10838] mca: base: components_open: component self open function >> successful >> [singularity:10838] mca: base: component_find: unable to open >> /usr/local/lib/openmpi/mca_coll_sm: libmca_common_sm.so.0: cannot open >> shared object file: No such file or directory (ignored) >> [singularity:10838] select: initializing btl component mosix >> [singularity:10838] select: init of component mosix returned success >> [singularity:10838] select: initializing btl component self >> [singularity:10838] select: init of component self returned success >> [singularity:10838] *** Process received signal *** >> [singularity:10838] Signal: Segmentation fault (11) >> [singularity:10838] Signal code: Address not mapped (1) >> [singularity:10838] Failing at address: 0x30 >> [singularity:10838] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x36420) >> [0x7fa94a3cd420] >> [singularity:10838] [ 1] /lib/x86_64-linux-gnu/libc.so.6(+0x84391) >> [0x7fa94a41b391] >> [singularity:10838] [ 2] /lib/x86_64-linux-gnu/libc.so.6(__strdup+0x16) >> [0x7fa94a41b086] >> [singularity:10838] [ 3] >> /usr/local/lib/libmpi.so.0(opal_argv_append_nosize+0xf7) [0x7fa94add66a4] >> [singularity:10838] [ 4] /usr/local/lib/openmpi/mca_bml_r2.so(+0x1cf5) >> [0x7fa946177cf5] >> [singularity:10838] [ 5] /usr/local/lib/openmpi/mca_bml_r2.so(+0x1e50) >> [0x7fa946177e50] >> [singularity:10838] [ 6] >> /usr/local/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_add_procs+0x12f) >> [0x7fa946382b6d] >> [singularity:10838] [ 7] /usr/local/lib/libmpi.so.0(ompi_mpi_init+0x909) >> [0x7fa94acd1549] >> [singularity:10838] [ 8] /usr/local/lib/libmpi.so.0(MPI_Init+0x16c) >> [0x7fa94ad033ec] >> [singularity:10838] [ 9] >> /home/alex/huji/benchmarks/simple/hello(_ZN3MPI4InitERiRPPc+0x23) [0x409e2d] >> [singularity:10838] [10] /home/alex/huji/benchmarks/simple/hello(main+0x22) >> [0x408f66] >> [singularity:10838] [11] >> /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed) [0x7fa94a3b830d] >> [singularity:10838] [12] /home/alex/huji/benchmarks/simple/hello() [0x408e89] >> [singularity:10838] *** End of error message *** >> -------------------------------------------------------------------------- >> mpirun noticed that process rank 0 with PID 10838 on node singularity exited >> on signal 11 (Segmentation fault). >> -------------------------------------------------------------------------- >> alex@singularity:~/huji/benchmarks/simple$ mpirun -mca btl self,tcp hello >> [singularity:10841] mca: base: component_find: unable to open >> /usr/local/lib/openmpi/mca_mpool_sm: libmca_common_sm.so.0: cannot open >> shared object file: No such file or directory (ignored) >> [singularity:10841] mca: base: component_find: unable to open >> /usr/local/lib/openmpi/mca_coll_sm: libmca_common_sm.so.0: cannot open >> shared object file: No such file or directory (ignored) >> Hello world! >> alex@singularity:~/huji/benchmarks/simple$ > > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel