Ralph, See my response to Larry. The impossibly large value was a figment of gdb's imagination.
This system has worked for Open MPI when it was still at 11.0. I cannot say if the current problem is w/ FreeBSD-11.1 (e.g. its compiler) or with Open MPI. I am trying a gcc-based build now. -Pau On Wed, Aug 30, 2017 at 4:22 PM, r...@open-mpi.org <r...@open-mpi.org> wrote: > Yeah, that caught my eye too as that is impossibly large. We only have a > handful of active queues - looks to me like there is some kind of alignment > issue. > > Paul - has this configuration worked with prior versions of OMPI? Or is > this something new? > > Ralph > > On Aug 30, 2017, at 4:17 PM, Larry Baker <ba...@usgs.gov> wrote: > > Paul, > > (gdb) print base->nactivequeues > > > seems like an extraordinarily large number to me. I don't know what the > implications are of the --enable-debug clang option is. Any chance the > SEGFAULT is a debugging trap when an uninitialized value is encountered? > > The other thought I had is an alignment trap if, for example, > nactivequeues is a 64-bit int but is not 64-bit aligned. As far as I can > tell, nactivequeues is a plain int. But, what that is on FreeBSD/amd64, I > do not know. > > Should there be more information in dmesg or a system log file with the > trap code so you can identify whether it is an instruction fetch (VERY > unlikely), an operand fetch, or a store that caused the trap? > > Larry Baker > US Geological Survey > 650-329-5608 <(650)%20329-5608> > ba...@usgs.gov > > > > On 30 Aug 2017, at 3:17:05 PM, Paul Hargrove <phhargr...@lbl.gov> wrote: > > I am testing the 2.1.2rc3 tarball on FreeBSD-11.1, configured with > --prefix=[...] --enable-debug CC=clang CXX=clang++ > --disable-mpi-fortran --with-hwloc=/usr/local > > The CC/CXX setting are to use the system default compilers (rather than > gcc/g++ in /usr/local/bin). > The --with-hwloc is to avoid issue #3992 > <https://github.com/open-mpi/ompi/issues/3992> (though I have not > determined if that impacts this RC). > > When running ring_c I get a SEGV from orterun, for which a gdb backtrace > is given below. > The one surprising thing (highlighted) in the backtrace is that both the > RHS and LHS of the assignment appear to be valid memory locations. > So, if the backtrace is accurate then I am at a loss as to why a SEGV > occurs. > > -Paul > > > Program terminated with signal 11, Segmentation fault. > [...] > #0 opal_libevent2022_event_assign (ev=0x8065482c0, base=<value optimized > out>, fd=<value optimized out>, > events=2, callback=<value optimized out>, arg=0x0) > at /home/phargrov/OMPI/openmpi-2.1.2rc3-freebsd11-amd64/ > openmpi-2.1.2rc3/opal/mca/event/libevent2022/libevent/event.c:1779 > 1779 ev->ev_pri = base->nactivequeues / 2; > (gdb) print base->nactivequeues > $3 = 106201992 > (gdb) print ev->ev_pri > $4 = 0 '\0' > (gdb) where > #0 opal_libevent2022_event_assign (ev=0x8065482c0, base=<value optimized > out>, fd=<value optimized out>, > events=2, callback=<value optimized out>, arg=0x0) > at /home/phargrov/OMPI/openmpi-2.1.2rc3-freebsd11-amd64/ > openmpi-2.1.2rc3/opal/mca/event/libevent2022/libevent/event.c:1779 > #1 0x00000008062e1fd2 in pmix_start_progress_thread () > at /home/phargrov/OMPI/openmpi-2.1.2rc3-freebsd11-amd64/ > openmpi-2.1.2rc3/opal/mca/pmix/pmix112/pmix/src/util/progress_threads.c:83 > #2 0x00000008063047e4 in PMIx_server_init (module=0x806545be8, > info=0x802e16a00, ninfo=2) > at /home/phargrov/OMPI/openmpi-2.1.2rc3-freebsd11-amd64/ > openmpi-2.1.2rc3/opal/mca/pmix/pmix112/pmix/src/server/pmix_server.c:310 > #3 0x00000008062c12f6 in pmix1_server_init (module=0x800b106a0, > info=0x7fffffffe290) > at /home/phargrov/OMPI/openmpi-2.1.2rc3-freebsd11-amd64/ > openmpi-2.1.2rc3/opal/mca/pmix/pmix112/pmix1_server_south.c:140 > #4 0x0000000800889f43 in pmix_server_init () > at /home/phargrov/OMPI/openmpi-2.1.2rc3-freebsd11-amd64/ > openmpi-2.1.2rc3/orte/orted/pmix/pmix_server.c:261 > #5 0x0000000803e22d87 in rte_init () > at /home/phargrov/OMPI/openmpi-2.1.2rc3-freebsd11-amd64/ > openmpi-2.1.2rc3/orte/mca/ess/hnp/ess_hnp_module.c:666 > #6 0x000000080084a45e in orte_init (pargc=0x7fffffffe988, > pargv=0x7fffffffe980, flags=4) > at /home/phargrov/OMPI/openmpi-2.1.2rc3-freebsd11-amd64/ > openmpi-2.1.2rc3/orte/runtime/orte_init.c:226 > #7 0x00000000004046a4 in orterun (argc=7, argv=0x7fffffffea18) > at /home/phargrov/OMPI/openmpi-2.1.2rc3-freebsd11-amd64/ > openmpi-2.1.2rc3/orte/tools/orterun/orterun.c:831 > #8 0x0000000000403bc2 in main (argc=7, argv=0x7fffffffea18) > at /home/phargrov/OMPI/openmpi-2.1.2rc3-freebsd11-amd64/ > openmpi-2.1.2rc3/orte/tools/orterun/main.c:13 > > > > -- > Paul H. Hargrove phhargr...@lbl.gov > Computer Languages & Systems Software (CLaSS) Group > Computer Science Department Tel: +1-510-495-2352 > <(510)%20495-2352> > Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 > <(510)%20486-6900> > _______________________________________________ > devel mailing list > devel@lists.open-mpi.org > https://lists.open-mpi.org/mailman/listinfo/devel > > > _______________________________________________ > devel mailing list > devel@lists.open-mpi.org > https://lists.open-mpi.org/mailman/listinfo/devel > > > > _______________________________________________ > devel mailing list > devel@lists.open-mpi.org > https://lists.open-mpi.org/mailman/listinfo/devel > -- Paul H. Hargrove phhargr...@lbl.gov Computer Languages & Systems Software (CLaSS) Group Computer Science Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
_______________________________________________ devel mailing list devel@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/devel