FYI, that segfault problem did not occur when I tested 3.1.2rc1. Thanks,
> On Aug 17, 2018, at 10:28 AM, Pavel Shamis <pasharesea...@gmail.com> wrote: > > It looks to me like mxm related failure ? > > On Thu, Aug 16, 2018 at 1:51 PM Vallee, Geoffroy R. <valle...@ornl.gov> wrote: > Hi, > > I ran some tests on Summitdev here at ORNL: > - the UCX problem is solved and I get the expected results for the tests that > I am running (netpipe and IMB). > - without UCX: > * the performance numbers are below what would be expected but I > believe at this point that the slight performance deficiency is due to other > users using other parts of the system. > * I also encountered the following problem while running IMB_EXT and > I now realize that I had the same problem with 2.4.1rc1 but did not catch it > at the time: > [summitdev-login1:112517:0] Caught signal 11 (Segmentation fault) > [summitdev-r0c2n13:91094:0] Caught signal 11 (Segmentation fault) > ==== backtrace ==== > 2 0x0000000000073864 mxm_handle_error() > /var/tmp/OFED_topdir/BUILD/mxm-3.6.3102/src/mxm/util/debug/debug.c:641 > 3 0x0000000000073fa4 mxm_error_signal_handler() > /var/tmp/OFED_topdir/BUILD/mxm-3.6.3102/src/mxm/util/debug/debug.c:616 > 4 0x0000000000017b24 ompi_osc_rdma_component_query() osc_rdma_component.c:0 > 5 0x00000000000d4634 ompi_osc_base_select() ??:0 > 6 0x0000000000065e84 ompi_win_create() ??:0 > 7 0x00000000000a2488 PMPI_Win_create() ??:0 > 8 0x000000001000b28c IMB_window() ??:0 > 9 0x0000000010005764 IMB_init_buffers_iter() ??:0 > 10 0x0000000010001ef8 main() ??:0 > 11 0x0000000000024980 generic_start_main.isra.0() libc-start.c:0 > 12 0x0000000000024b74 __libc_start_main() ??:0 > =================== > ==== backtrace ==== > 2 0x0000000000073864 mxm_handle_error() > /var/tmp/OFED_topdir/BUILD/mxm-3.6.3102/src/mxm/util/debug/debug.c:641 > 3 0x0000000000073fa4 mxm_error_signal_handler() > /var/tmp/OFED_topdir/BUILD/mxm-3.6.3102/src/mxm/util/debug/debug.c:616 > 4 0x0000000000017b24 ompi_osc_rdma_component_query() osc_rdma_component.c:0 > 5 0x00000000000d4634 ompi_osc_base_select() ??:0 > 6 0x0000000000065e84 ompi_win_create() ??:0 > 7 0x00000000000a2488 PMPI_Win_create() ??:0 > 8 0x000000001000b28c IMB_window() ??:0 > 9 0x0000000010005764 IMB_init_buffers_iter() ??:0 > 10 0x0000000010001ef8 main() ??:0 > 11 0x0000000000024980 generic_start_main.isra.0() libc-start.c:0 > 12 0x0000000000024b74 __libc_start_main() ??:0 > =================== > > FYI, the 2.x series is not important to me so it can stay as is. I will move > on testing 3.1.2rc1. > > Thanks, > > > > On Aug 15, 2018, at 6:07 PM, Jeff Squyres (jsquyres) via devel > > <devel@lists.open-mpi.org> wrote: > > > > Per our discussion over the weekend and on the weekly webex yesterday, > > we're releasing v2.1.5. There are only two changes: > > > > 1. A trivial link issue for UCX. > > 2. A fix for the vader BTL issue. This is how I described it in NEWS: > > > > - A subtle race condition bug was discovered in the "vader" BTL > > (shared memory communications) that, in rare instances, can cause > > MPI processes to crash or incorrectly classify (or effectively drop) > > an MPI message sent via shared memory. If you are using the "ob1" > > PML with "vader" for shared memory communication (note that vader is > > the default for shared memory communication with ob1), you need to > > upgrade to v2.1.5 to fix this issue. You may also upgrade to the > > following versions to fix this issue: > > - Open MPI v3.0.1 (released March, 2018) or later in the v3.0.x > > series > > - Open MPI v3.1.2 (expected end of August, 2018) or later > > > > This vader fix was warranted serious enough to generate a 2.1.5 release. > > This really will be the end of the 2.1.x series. Trust me; my name is Joe > > Isuzu. > > > > 2.1.5rc1 will be available from the usual location in a few minutes (the > > website will update in about 7 minutes): > > > > https://www.open-mpi.org/software/ompi/v2.1/ > > > > -- > > Jeff Squyres > > jsquy...@cisco.com > > > > _______________________________________________ > > devel mailing list > > devel@lists.open-mpi.org > > https://lists.open-mpi.org/mailman/listinfo/devel > > _______________________________________________ > devel mailing list > devel@lists.open-mpi.org > https://lists.open-mpi.org/mailman/listinfo/devel > _______________________________________________ > devel mailing list > devel@lists.open-mpi.org > https://lists.open-mpi.org/mailman/listinfo/devel _______________________________________________ devel mailing list devel@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/devel