It seems the ompi_free_list_init() in libnbc_open() failed for some reason. That would explain why mca_coll_libnbc_component.active_requests is not initialized and hence crash in libnbc_close().
This might help, but still doesn't explain why the free list initialization failed: diff --git a/ompi/mca/coll/libnbc/coll_libnbc_component.c b/ompi/mca/coll/libnbc/coll_libnbc_component.c index 1a2a81a..2d7b82c 100644 --- a/ompi/mca/coll/libnbc/coll_libnbc_component.c +++ b/ompi/mca/coll/libnbc/coll_libnbc_component.c @@ -88,6 +88,7 @@ libnbc_open(void) int ret; OBJ_CONSTRUCT(&mca_coll_libnbc_component.requests, ompi_free_list_t); + OBJ_CONSTRUCT(&mca_coll_libnbc_component.active_requests, opal_list_t); ret = ompi_free_list_init(&mca_coll_libnbc_component.requests, sizeof(ompi_coll_libnbc_request_t), OBJ_CLASS(ompi_coll_libnbc_request_t), @@ -97,7 +98,6 @@ libnbc_open(void) NULL); if (OMPI_SUCCESS != ret) return ret; - OBJ_CONSTRUCT(&mca_coll_libnbc_component.active_requests, opal_list_t); /* note: active comms is the number of communicators who have had a non-blocking collective started */ mca_coll_libnbc_component.active_comms = 0; It looks like an issue with detecting the proper L2 cache line size on power. I'll take a look over the weekend. Regards --Nysal On Tue, Feb 3, 2015 at 8:58 AM, Paul Hargrove <phhargr...@lbl.gov> wrote: > On a Linux/PPC64 system I see the failure below from a build of the > current master tarball. > This build was configured with > --prefix=... --enable-debug \ > CFLAGS=-m64 --with-wrapper-cflags=-m64 \ > CXXFLAGS=-m64 --with-wrapper-cxxflags=-m64 \ > FCFLAGS=-m64 --with-wrapper-fcflags=-m64 > > I am not sure if putting "-m64" in both the *FLAGS and wrapper flags is > required, but am confident the error is unrelated. > > -Paul > > $ mpirun -mca btl sm,self -np 2 examples/ring_c' > [pcp-k-422:08534] mca: base: components_open: component coll / libnbc open > function failed > ring_c: > /home/phargrov/OMPI/openmpi-master-linux-ppc64/openmpi-dev-803-g5919b63/ompi/mca/coll/libnbc/coll_libnbc_component.c:118: > libnbc_close: Assertion `((0xdeafbeedULL << 32) + 0xdeafbeedULL) == > ((opal_object_t *) > (&mca_coll_libnbc_component.active_requests))->obj_magic_id' failed. > [pcp-k-422:08534] *** Process received signal *** > [pcp-k-422:08534] Signal: Aborted (6) > [pcp-k-422:08534] Signal code: (-6) > [pcp-k-422:08534] [ 0] [0x3fff8bd90478] > [pcp-k-422:08534] [ 1] /lib64/libc.so.6(gsignal-0x155030)[0x3fff8b9fc510] > [pcp-k-422:08534] [ 2] /lib64/libc.so.6(abort-0x150094)[0x3fff8ba01be4] > [pcp-k-422:08534] [ 3] /lib64/libc.so.6(+0x572ac)[0x3fff8b9f22ac] > [pcp-k-422:08534] [ 4] > /lib64/libc.so.6(__assert_fail-0x15ddac)[0x3fff8b9f239c] > [pcp-k-422:08534] [ 5] > /home/phargrov/OMPI/openmpi-master-linux-ppc64/INST/lib/openmpi/mca_coll_libnbc.so(+0x9088)[0x3fff8a190088] > [pcp-k-422:08534] [ 6] > /home/phargrov/OMPI/openmpi-master-linux-ppc64/INST/lib/libopen-pal.so.0(mca_base_component_close-0xed5e8)[0x3fff8b758308] > [pcp-k-422:08534] [ 7] > /home/phargrov/OMPI/openmpi-master-linux-ppc64/INST/lib/libopen-pal.so.0(+0xa9c5c)[0x3fff8b757c5c] > [pcp-k-422:08534] [ 8] > /home/phargrov/OMPI/openmpi-master-linux-ppc64/INST/lib/libopen-pal.so.0(mca_base_framework_components_open-0xee088)[0x3fff8b757778] > [pcp-k-422:08534] [ 9] > /home/phargrov/OMPI/openmpi-master-linux-ppc64/INST/lib/libopen-pal.so.0(mca_base_framework_open-0xdc3f8)[0x3fff8b76a620] > [pcp-k-422:08534] [10] > /home/phargrov/OMPI/openmpi-master-linux-ppc64/INST/lib/libmpi.so.0(ompi_mpi_init-0x12d5fc)[0x3fff8bc33d14] > [pcp-k-422:08534] [11] > /home/phargrov/OMPI/openmpi-master-linux-ppc64/INST/lib/libmpi.so.0(MPI_Init-0xe4734)[0x3fff8bc821bc] > [pcp-k-422:08534] [12] examples/ring_c[0x10000a20] > [pcp-k-422:08534] [13] /lib64/libc.so.6(+0x47b6c)[0x3fff8b9e2b6c] > [pcp-k-422:08534] [14] > /lib64/libc.so.6(__libc_start_main-0x16caf8)[0x3fff8b9e2d98] > [pcp-k-422:08534] *** End of error message *** > [pcp-k-422:08535] mca: base: components_open: component coll / libnbc open > function failed > ring_c: > /home/phargrov/OMPI/openmpi-master-linux-ppc64/openmpi-dev-803-g5919b63/ompi/mca/coll/libnbc/coll_libnbc_component.c:118: > libnbc_close: Assertion `((0xdeafbeedULL << 32) + 0xdeafbeedULL) == > ((opal_object_t *) > (&mca_coll_libnbc_component.active_requests))->obj_magic_id' failed. > [pcp-k-422:08535] *** Process received signal *** > [pcp-k-422:08535] Signal: Aborted (6) > [pcp-k-422:08535] Signal code: (-6) > [pcp-k-422:08535] [ 0] [0x3fff99e30478] > [pcp-k-422:08535] [ 1] /lib64/libc.so.6(gsignal-0x155030)[0x3fff99a9c510] > [pcp-k-422:08535] [ 2] /lib64/libc.so.6(abort-0x150094)[0x3fff99aa1be4] > [pcp-k-422:08535] [ 3] /lib64/libc.so.6(+0x572ac)[0x3fff99a922ac] > [pcp-k-422:08535] [ 4] > /lib64/libc.so.6(__assert_fail-0x15ddac)[0x3fff99a9239c] > [pcp-k-422:08535] [ 5] > /home/phargrov/OMPI/openmpi-master-linux-ppc64/INST/lib/openmpi/mca_coll_libnbc.so(+0x9088)[0x3fff98230088] > [pcp-k-422:08535] [ 6] > /home/phargrov/OMPI/openmpi-master-linux-ppc64/INST/lib/libopen-pal.so.0(mca_base_component_close-0xed5e8)[0x3fff997f8308] > [pcp-k-422:08535] [ 7] > /home/phargrov/OMPI/openmpi-master-linux-ppc64/INST/lib/libopen-pal.so.0(+0xa9c5c)[0x3fff997f7c5c] > [pcp-k-422:08535] [ 8] > /home/phargrov/OMPI/openmpi-master-linux-ppc64/INST/lib/libopen-pal.so.0(mca_base_framework_components_open-0xee088)[0x3fff997f7778] > [pcp-k-422:08535] [ 9] > /home/phargrov/OMPI/openmpi-master-linux-ppc64/INST/lib/libopen-pal.so.0(mca_base_framework_open-0xdc3f8)[0x3fff9980a620] > [pcp-k-422:08535] [10] > /home/phargrov/OMPI/openmpi-master-linux-ppc64/INST/lib/libmpi.so.0(ompi_mpi_init-0x12d5fc)[0x3fff99cd3d14] > [pcp-k-422:08535] [11] > /home/phargrov/OMPI/openmpi-master-linux-ppc64/INST/lib/libmpi.so.0(MPI_Init-0xe4734)[0x3fff99d221bc] > [pcp-k-422:08535] [12] examples/ring_c[0x10000a20] > [pcp-k-422:08535] [13] /lib64/libc.so.6(+0x47b6c)[0x3fff99a82b6c] > [pcp-k-422:08535] [14] > /lib64/libc.so.6(__libc_start_main-0x16caf8)[0x3fff99a82d98] > [pcp-k-422:08535] *** End of error message *** > -------------------------------------------------------------------------- > mpirun noticed that process rank 1 with PID 0 on node pcp-k-422 exited on > signal 6 (Aborted). > -------------------------------------------------------------------------- > > > > -- > Paul H. Hargrove phhargr...@lbl.gov > Computer Languages & Systems Software (CLaSS) Group > Computer Science Department Tel: +1-510-495-2352 > Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 > > _______________________________________________ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2015/02/16902.php >