It seems the ompi_free_list_init() in libnbc_open() failed for some reason.
That would explain why mca_coll_libnbc_component.active_requests is not
initialized and hence crash in libnbc_close().

This might help, but still doesn't explain why the free list initialization
failed:
diff --git a/ompi/mca/coll/libnbc/coll_libnbc_component.c
b/ompi/mca/coll/libnbc/coll_libnbc_component.c
index 1a2a81a..2d7b82c 100644
--- a/ompi/mca/coll/libnbc/coll_libnbc_component.c
+++ b/ompi/mca/coll/libnbc/coll_libnbc_component.c
@@ -88,6 +88,7 @@ libnbc_open(void)
     int ret;

     OBJ_CONSTRUCT(&mca_coll_libnbc_component.requests, ompi_free_list_t);
+    OBJ_CONSTRUCT(&mca_coll_libnbc_component.active_requests, opal_list_t);
     ret = ompi_free_list_init(&mca_coll_libnbc_component.requests,
                               sizeof(ompi_coll_libnbc_request_t),
                               OBJ_CLASS(ompi_coll_libnbc_request_t),
@@ -97,7 +98,6 @@ libnbc_open(void)
                               NULL);
     if (OMPI_SUCCESS != ret) return ret;

-    OBJ_CONSTRUCT(&mca_coll_libnbc_component.active_requests, opal_list_t);
     /* note: active comms is the number of communicators who have had
        a non-blocking collective started */
     mca_coll_libnbc_component.active_comms = 0;

It looks like an issue with detecting the proper L2 cache line size on
power.
I'll take a look over the weekend.

Regards
--Nysal

On Tue, Feb 3, 2015 at 8:58 AM, Paul Hargrove <phhargr...@lbl.gov> wrote:

> On a Linux/PPC64 system I see the failure below from a build of the
> current master tarball.
> This build was configured with
>    --prefix=... --enable-debug \
>   CFLAGS=-m64 --with-wrapper-cflags=-m64 \
>   CXXFLAGS=-m64 --with-wrapper-cxxflags=-m64 \
>   FCFLAGS=-m64 --with-wrapper-fcflags=-m64
>
> I am not sure if putting "-m64" in both the *FLAGS and wrapper flags is
> required, but am confident the error is unrelated.
>
> -Paul
>
> $ mpirun -mca btl sm,self -np 2 examples/ring_c'
> [pcp-k-422:08534] mca: base: components_open: component coll / libnbc open
> function failed
> ring_c:
> /home/phargrov/OMPI/openmpi-master-linux-ppc64/openmpi-dev-803-g5919b63/ompi/mca/coll/libnbc/coll_libnbc_component.c:118:
> libnbc_close: Assertion `((0xdeafbeedULL << 32) + 0xdeafbeedULL) ==
> ((opal_object_t *)
> (&mca_coll_libnbc_component.active_requests))->obj_magic_id' failed.
> [pcp-k-422:08534] *** Process received signal ***
> [pcp-k-422:08534] Signal: Aborted (6)
> [pcp-k-422:08534] Signal code:  (-6)
> [pcp-k-422:08534] [ 0] [0x3fff8bd90478]
> [pcp-k-422:08534] [ 1] /lib64/libc.so.6(gsignal-0x155030)[0x3fff8b9fc510]
> [pcp-k-422:08534] [ 2] /lib64/libc.so.6(abort-0x150094)[0x3fff8ba01be4]
> [pcp-k-422:08534] [ 3] /lib64/libc.so.6(+0x572ac)[0x3fff8b9f22ac]
> [pcp-k-422:08534] [ 4]
> /lib64/libc.so.6(__assert_fail-0x15ddac)[0x3fff8b9f239c]
> [pcp-k-422:08534] [ 5]
> /home/phargrov/OMPI/openmpi-master-linux-ppc64/INST/lib/openmpi/mca_coll_libnbc.so(+0x9088)[0x3fff8a190088]
> [pcp-k-422:08534] [ 6]
> /home/phargrov/OMPI/openmpi-master-linux-ppc64/INST/lib/libopen-pal.so.0(mca_base_component_close-0xed5e8)[0x3fff8b758308]
> [pcp-k-422:08534] [ 7]
> /home/phargrov/OMPI/openmpi-master-linux-ppc64/INST/lib/libopen-pal.so.0(+0xa9c5c)[0x3fff8b757c5c]
> [pcp-k-422:08534] [ 8]
> /home/phargrov/OMPI/openmpi-master-linux-ppc64/INST/lib/libopen-pal.so.0(mca_base_framework_components_open-0xee088)[0x3fff8b757778]
> [pcp-k-422:08534] [ 9]
> /home/phargrov/OMPI/openmpi-master-linux-ppc64/INST/lib/libopen-pal.so.0(mca_base_framework_open-0xdc3f8)[0x3fff8b76a620]
> [pcp-k-422:08534] [10]
> /home/phargrov/OMPI/openmpi-master-linux-ppc64/INST/lib/libmpi.so.0(ompi_mpi_init-0x12d5fc)[0x3fff8bc33d14]
> [pcp-k-422:08534] [11]
> /home/phargrov/OMPI/openmpi-master-linux-ppc64/INST/lib/libmpi.so.0(MPI_Init-0xe4734)[0x3fff8bc821bc]
> [pcp-k-422:08534] [12] examples/ring_c[0x10000a20]
> [pcp-k-422:08534] [13] /lib64/libc.so.6(+0x47b6c)[0x3fff8b9e2b6c]
> [pcp-k-422:08534] [14]
> /lib64/libc.so.6(__libc_start_main-0x16caf8)[0x3fff8b9e2d98]
> [pcp-k-422:08534] *** End of error message ***
> [pcp-k-422:08535] mca: base: components_open: component coll / libnbc open
> function failed
> ring_c:
> /home/phargrov/OMPI/openmpi-master-linux-ppc64/openmpi-dev-803-g5919b63/ompi/mca/coll/libnbc/coll_libnbc_component.c:118:
> libnbc_close: Assertion `((0xdeafbeedULL << 32) + 0xdeafbeedULL) ==
> ((opal_object_t *)
> (&mca_coll_libnbc_component.active_requests))->obj_magic_id' failed.
> [pcp-k-422:08535] *** Process received signal ***
> [pcp-k-422:08535] Signal: Aborted (6)
> [pcp-k-422:08535] Signal code:  (-6)
> [pcp-k-422:08535] [ 0] [0x3fff99e30478]
> [pcp-k-422:08535] [ 1] /lib64/libc.so.6(gsignal-0x155030)[0x3fff99a9c510]
> [pcp-k-422:08535] [ 2] /lib64/libc.so.6(abort-0x150094)[0x3fff99aa1be4]
> [pcp-k-422:08535] [ 3] /lib64/libc.so.6(+0x572ac)[0x3fff99a922ac]
> [pcp-k-422:08535] [ 4]
> /lib64/libc.so.6(__assert_fail-0x15ddac)[0x3fff99a9239c]
> [pcp-k-422:08535] [ 5]
> /home/phargrov/OMPI/openmpi-master-linux-ppc64/INST/lib/openmpi/mca_coll_libnbc.so(+0x9088)[0x3fff98230088]
> [pcp-k-422:08535] [ 6]
> /home/phargrov/OMPI/openmpi-master-linux-ppc64/INST/lib/libopen-pal.so.0(mca_base_component_close-0xed5e8)[0x3fff997f8308]
> [pcp-k-422:08535] [ 7]
> /home/phargrov/OMPI/openmpi-master-linux-ppc64/INST/lib/libopen-pal.so.0(+0xa9c5c)[0x3fff997f7c5c]
> [pcp-k-422:08535] [ 8]
> /home/phargrov/OMPI/openmpi-master-linux-ppc64/INST/lib/libopen-pal.so.0(mca_base_framework_components_open-0xee088)[0x3fff997f7778]
> [pcp-k-422:08535] [ 9]
> /home/phargrov/OMPI/openmpi-master-linux-ppc64/INST/lib/libopen-pal.so.0(mca_base_framework_open-0xdc3f8)[0x3fff9980a620]
> [pcp-k-422:08535] [10]
> /home/phargrov/OMPI/openmpi-master-linux-ppc64/INST/lib/libmpi.so.0(ompi_mpi_init-0x12d5fc)[0x3fff99cd3d14]
> [pcp-k-422:08535] [11]
> /home/phargrov/OMPI/openmpi-master-linux-ppc64/INST/lib/libmpi.so.0(MPI_Init-0xe4734)[0x3fff99d221bc]
> [pcp-k-422:08535] [12] examples/ring_c[0x10000a20]
> [pcp-k-422:08535] [13] /lib64/libc.so.6(+0x47b6c)[0x3fff99a82b6c]
> [pcp-k-422:08535] [14]
> /lib64/libc.so.6(__libc_start_main-0x16caf8)[0x3fff99a82d98]
> [pcp-k-422:08535] *** End of error message ***
> --------------------------------------------------------------------------
> mpirun noticed that process rank 1 with PID 0 on node pcp-k-422 exited on
> signal 6 (Aborted).
> --------------------------------------------------------------------------
>
>
>
> --
> Paul H. Hargrove                          phhargr...@lbl.gov
> Computer Languages & Systems Software (CLaSS) Group
> Computer Science Department               Tel: +1-510-495-2352
> Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900
>
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2015/02/16902.php
>

Reply via email to