And the FreeBSD backtraces again, this time configured with --enable-debug and for all threads:
The 100%-cpu ring_c process: (gdb) thread apply all where Thread 2 (Thread 802007400 (LWP 182916/ring_c)): #0 0x0000000800de7aac in sched_yield () from /lib/libc.so.7 #1 0x00000008013c7a5a in opal_progress () at /home/phargrov/OMPI/openmpi-1.7-latest-freebsd9-amd64/openmpi-1.7-latest/opal/runtime/opal_progress.c:199 #2 0x00000008008670ec in ompi_mpi_init (argc=1, argv=0x7fffffffd3e0, requested=0, provided=0x7fffffffd328) at /home/phargrov/OMPI/openmpi-1.7-latest-freebsd9-amd64/openmpi-1.7-latest/ompi/runtime/ompi_mpi_init.c:618 #3 0x000000080089aefe in PMPI_Init (argc=0x7fffffffd36c, argv=0x7fffffffd360) at pinit.c:84 #4 0x0000000000400963 in main (argc=1, argv=0x7fffffffd3e0) at ring_c.c:19 Thread 1 (Thread 802007800 (LWP 186415/ring_c)): #0 0x0000000800e2711c in poll () from /lib/libc.so.7 #1 0x0000000800b727fe in poll () from /lib/libthr.so.3 #2 0x000000080142edc1 in poll_dispatch (base=0x8020cd900, tv=0x0) at /home/phargrov/OMPI/openmpi-1.7-latest-freebsd9-amd64/openmpi-1.7-latest/opal/mca/event/libevent2021/libevent/poll.c:165 #3 0x0000000801422ca1 in opal_libevent2021_event_base_loop (base=0x8020cd900, flags=1) at /home/phargrov/OMPI/openmpi-1.7-latest-freebsd9-amd64/openmpi-1.7-latest/opal/mca/event/libevent2021/libevent/event.c:1631 #4 0x00000008010f2c22 in orte_progress_thread_engine (obj=0x80139b160) at /home/phargrov/OMPI/openmpi-1.7-latest-freebsd9-amd64/openmpi-1.7-latest/orte/runtime/orte_init.c:180 #5 0x0000000800b700a4 in pthread_getprio () from /lib/libthr.so.3 #6 0x0000000000000000 in ?? () Error accessing memory address 0x7fffffbfe000: Bad address. The idle ring_c process: (gdb) thread apply all where Thread 2 (Thread 802007400 (LWP 183983/ring_c)): #0 0x0000000800e6c44c in nanosleep () from /lib/libc.so.7 #1 0x0000000800b729d5 in nanosleep () from /lib/libthr.so.3 #2 0x0000000801161618 in orte_routed_base_register_sync (setup=true) at /home/phargrov/OMPI/openmpi-1.7-latest-freebsd9-amd64/openmpi-1.7-latest/orte/mca/routed/base/routed_base_fns.c:344 #3 0x0000000802a0a0a2 in init_routes (job=2628321281, ndat=0x0) at /home/phargrov/OMPI/openmpi-1.7-latest-freebsd9-amd64/openmpi-1.7-latest/orte/mca/routed/binomial/routed_binomial.c:705 #4 0x00000008011272ce in orte_ess_base_app_setup (db_restrict_local=true) at /home/phargrov/OMPI/openmpi-1.7-latest-freebsd9-amd64/openmpi-1.7-latest/orte/mca/ess/base/ess_base_std_app.c:233 #5 0x0000000802401408 in rte_init () at /home/phargrov/OMPI/openmpi-1.7-latest-freebsd9-amd64/openmpi-1.7-latest/orte/mca/ess/env/ess_env_module.c:146 #6 0x00000008010f2b28 in orte_init (pargc=0x0, pargv=0x0, flags=32) at /home/phargrov/OMPI/openmpi-1.7-latest-freebsd9-amd64/openmpi-1.7-latest/orte/runtime/orte_init.c:158 #7 0x0000000800866bde in ompi_mpi_init (argc=1, argv=0x7fffffffd3e0, requested=0, provided=0x7fffffffd328) at /home/phargrov/OMPI/openmpi-1.7-latest-freebsd9-amd64/openmpi-1.7-latest/ompi/runtime/ompi_mpi_init.c:451 #8 0x000000080089aefe in PMPI_Init (argc=0x7fffffffd36c, argv=0x7fffffffd360) at pinit.c:84 #9 0x0000000000400963 in main (argc=1, argv=0x7fffffffd3e0) at ring_c.c:19 Thread 1 (Thread 802007800 (LWP 186412/ring_c)): #0 0x0000000800e2711c in poll () from /lib/libc.so.7 #1 0x0000000800b727fe in poll () from /lib/libthr.so.3 #2 0x000000080142edc1 in poll_dispatch (base=0x8020cd900, tv=0x0) at /home/phargrov/OMPI/openmpi-1.7-latest-freebsd9-amd64/openmpi-1.7-latest/opal/mca/event/libevent2021/libevent/poll.c:165 #3 0x0000000801422ca1 in opal_libevent2021_event_base_loop (base=0x8020cd900, flags=1) at /home/phargrov/OMPI/openmpi-1.7-latest-freebsd9-amd64/openmpi-1.7-latest/opal/mca/event/libevent2021/libevent/event.c:1631 #4 0x00000008010f2c22 in orte_progress_thread_engine (obj=0x80139b160) at /home/phargrov/OMPI/openmpi-1.7-latest-freebsd9-amd64/openmpi-1.7-latest/orte/runtime/orte_init.c:180 #5 0x0000000800b700a4 in pthread_getprio () from /lib/libthr.so.3 #6 0x0000000000000000 in ?? () Error accessing memory address 0x7fffffbfe000: Bad address. -Paul On Fri, Dec 20, 2013 at 2:59 PM, Paul Hargrove <phhargr...@lbl.gov> wrote: > This case is not quite like my OpenBSD-5 report. > On FreeBSD-9 I *can* run singletons, but "-np 2" hangs. > > The following hangs: > $ mpirun -np 2 examples/ring_c > > The following complains about the "bogus" btl selection. > So this is not the same as my problem with OpenBSD-5: > $ mpirun -mca btl bogus -np 2 examples/ring_c > [freebsd9-amd64.qemu:05926] mca: base: components_open: component pml / > bfo open function failed > [freebsd9-amd64.qemu:05926] mca: base: components_open: component pml / > ob1 open function failed > [freebsd9-amd64.qemu:05926] PML ob1 cannot be selected > -------------------------------------------------------------------------- > A requested component was not found, or was unable to be opened. This > means that this component is either not installed or is unable to be > used on your system (e.g., sometimes this means that shared libraries > that the component requires are unable to be found/loaded). Note that > Open MPI stopped checking at the first component that it did not find. > > Host: freebsd9-amd64.qemu > Framework: btl > Component: bogus > -------------------------------------------------------------------------- > -------------------------------------------------------------------------- > No available pml components were found! > > This means that there are no components of this type installed on your > system or all the components reported that they could not be used. > > This is a fatal error; your MPI process is likely to abort. Check the > output of the "ompi_info" command and ensure that components of this > type are available on your system. You may also wish to check the > value of the "component_path" MCA parameter and ensure that it has at > least one directory that contains valid MCA components. > -------------------------------------------------------------------------- > > > For the non-bogus case, "top" show one idle and one active ring_c process: > PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND > 5933 phargrov 2 29 0 98M 6384K select 1 0:32 100.00% ring_c > 5931 phargrov 2 20 0 77844K 4856K select 0 0:00 0.00% orterun > 5932 phargrov 2 24 0 51652K 4960K select 0 0:00 0.00% ring_c > > A backtrace for the 100%-cpu ring_c process: > (gdb) where > #0 0x0000000800d9811c in poll () from /lib/libc.so.7 > #1 0x0000000800ae37fe in poll () from /lib/libthr.so.3 > #2 0x00000008013259aa in poll_dispatch () > from > /home/phargrov/OMPI/openmpi-1.7-latest-freebsd9-amd64/INST/lib/libopen-pal.so.7 > #3 0x000000080131eb50 in opal_libevent2021_event_base_loop () > from > /home/phargrov/OMPI/openmpi-1.7-latest-freebsd9-amd64/INST/lib/libopen-pal.so.7 > #4 0x000000080106395d in orte_progress_thread_engine () > from > /home/phargrov/OMPI/openmpi-1.7-latest-freebsd9-amd64/INST/lib/libopen-rte.so.7 > #5 0x0000000800ae10a4 in pthread_getprio () from /lib/libthr.so.3 > #6 0x0000000000000000 in ?? () > Error accessing memory address 0x7fffffbfe000: Bad address. > > > And for the idle ring_c process: > (gdb) where > #0 0x0000000800d9811c in poll () from /lib/libc.so.7 > #1 0x0000000800ae37fe in poll () from /lib/libthr.so.3 > #2 0x00000008013259aa in poll_dispatch () > from > /home/phargrov/OMPI/openmpi-1.7-latest-freebsd9-amd64/INST/lib/libopen-pal.so.7 > #3 0x000000080131eb50 in opal_libevent2021_event_base_loop () > from > /home/phargrov/OMPI/openmpi-1.7-latest-freebsd9-amd64/INST/lib/libopen-pal.so.7 > #4 0x000000080106395d in orte_progress_thread_engine () > from > /home/phargrov/OMPI/openmpi-1.7-latest-freebsd9-amd64/INST/lib/libopen-rte.so.7 > #5 0x0000000800ae10a4 in pthread_getprio () from /lib/libthr.so.3 > #6 0x0000000000000000 in ?? () > Error accessing memory address 0x7fffffbfe000: Bad address. > > > They look to be the same, but I double checked that these are correct. > > -Paul > > > -- > Paul H. Hargrove phhargr...@lbl.gov > Future Technologies Group > Computer and Data Sciences Department Tel: +1-510-495-2352 > Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 > -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group Computer and Data Sciences Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900