FWIW:
I've confirmed that this is REGRESSION relative to 1.7.3, which works fine
on FreeBSD-9

-Paul


On Fri, Dec 20, 2013 at 3:30 PM, Paul Hargrove <phhargr...@lbl.gov> wrote:

> And the FreeBSD backtraces again, this time configured with --enable-debug
> and for all threads:
>
> The 100%-cpu ring_c process:
>
> (gdb) thread apply all where
>
> Thread 2 (Thread 802007400 (LWP 182916/ring_c)):
> #0  0x0000000800de7aac in sched_yield () from /lib/libc.so.7
> #1  0x00000008013c7a5a in opal_progress ()
>     at
> /home/phargrov/OMPI/openmpi-1.7-latest-freebsd9-amd64/openmpi-1.7-latest/opal/runtime/opal_progress.c:199
> #2  0x00000008008670ec in ompi_mpi_init (argc=1, argv=0x7fffffffd3e0,
> requested=0, provided=0x7fffffffd328)
>     at
> /home/phargrov/OMPI/openmpi-1.7-latest-freebsd9-amd64/openmpi-1.7-latest/ompi/runtime/ompi_mpi_init.c:618
> #3  0x000000080089aefe in PMPI_Init (argc=0x7fffffffd36c,
> argv=0x7fffffffd360) at pinit.c:84
> #4  0x0000000000400963 in main (argc=1, argv=0x7fffffffd3e0) at ring_c.c:19
>
> Thread 1 (Thread 802007800 (LWP 186415/ring_c)):
> #0  0x0000000800e2711c in poll () from /lib/libc.so.7
> #1  0x0000000800b727fe in poll () from /lib/libthr.so.3
> #2  0x000000080142edc1 in poll_dispatch (base=0x8020cd900, tv=0x0)
>     at
> /home/phargrov/OMPI/openmpi-1.7-latest-freebsd9-amd64/openmpi-1.7-latest/opal/mca/event/libevent2021/libevent/poll.c:165
> #3  0x0000000801422ca1 in opal_libevent2021_event_base_loop
> (base=0x8020cd900, flags=1)
>     at
> /home/phargrov/OMPI/openmpi-1.7-latest-freebsd9-amd64/openmpi-1.7-latest/opal/mca/event/libevent2021/libevent/event.c:1631
> #4  0x00000008010f2c22 in orte_progress_thread_engine (obj=0x80139b160)
>     at
> /home/phargrov/OMPI/openmpi-1.7-latest-freebsd9-amd64/openmpi-1.7-latest/orte/runtime/orte_init.c:180
> #5  0x0000000800b700a4 in pthread_getprio () from /lib/libthr.so.3
> #6  0x0000000000000000 in ?? ()
> Error accessing memory address 0x7fffffbfe000: Bad address.
>
>
> The idle ring_c process:
>
> (gdb) thread apply all where
>
> Thread 2 (Thread 802007400 (LWP 183983/ring_c)):
> #0  0x0000000800e6c44c in nanosleep () from /lib/libc.so.7
> #1  0x0000000800b729d5 in nanosleep () from /lib/libthr.so.3
> #2  0x0000000801161618 in orte_routed_base_register_sync (setup=true)
>     at
> /home/phargrov/OMPI/openmpi-1.7-latest-freebsd9-amd64/openmpi-1.7-latest/orte/mca/routed/base/routed_base_fns.c:344
> #3  0x0000000802a0a0a2 in init_routes (job=2628321281, ndat=0x0)
>     at
> /home/phargrov/OMPI/openmpi-1.7-latest-freebsd9-amd64/openmpi-1.7-latest/orte/mca/routed/binomial/routed_binomial.c:705
> #4  0x00000008011272ce in orte_ess_base_app_setup (db_restrict_local=true)
>     at
> /home/phargrov/OMPI/openmpi-1.7-latest-freebsd9-amd64/openmpi-1.7-latest/orte/mca/ess/base/ess_base_std_app.c:233
> #5  0x0000000802401408 in rte_init ()
>     at
> /home/phargrov/OMPI/openmpi-1.7-latest-freebsd9-amd64/openmpi-1.7-latest/orte/mca/ess/env/ess_env_module.c:146
> #6  0x00000008010f2b28 in orte_init (pargc=0x0, pargv=0x0, flags=32)
>     at
> /home/phargrov/OMPI/openmpi-1.7-latest-freebsd9-amd64/openmpi-1.7-latest/orte/runtime/orte_init.c:158
> #7  0x0000000800866bde in ompi_mpi_init (argc=1, argv=0x7fffffffd3e0,
> requested=0, provided=0x7fffffffd328)
>     at
> /home/phargrov/OMPI/openmpi-1.7-latest-freebsd9-amd64/openmpi-1.7-latest/ompi/runtime/ompi_mpi_init.c:451
> #8  0x000000080089aefe in PMPI_Init (argc=0x7fffffffd36c,
> argv=0x7fffffffd360) at pinit.c:84
> #9  0x0000000000400963 in main (argc=1, argv=0x7fffffffd3e0) at ring_c.c:19
>
> Thread 1 (Thread 802007800 (LWP 186412/ring_c)):
> #0  0x0000000800e2711c in poll () from /lib/libc.so.7
> #1  0x0000000800b727fe in poll () from /lib/libthr.so.3
> #2  0x000000080142edc1 in poll_dispatch (base=0x8020cd900, tv=0x0)
>     at
> /home/phargrov/OMPI/openmpi-1.7-latest-freebsd9-amd64/openmpi-1.7-latest/opal/mca/event/libevent2021/libevent/poll.c:165
> #3  0x0000000801422ca1 in opal_libevent2021_event_base_loop
> (base=0x8020cd900, flags=1)
>     at
> /home/phargrov/OMPI/openmpi-1.7-latest-freebsd9-amd64/openmpi-1.7-latest/opal/mca/event/libevent2021/libevent/event.c:1631
> #4  0x00000008010f2c22 in orte_progress_thread_engine (obj=0x80139b160)
>     at
> /home/phargrov/OMPI/openmpi-1.7-latest-freebsd9-amd64/openmpi-1.7-latest/orte/runtime/orte_init.c:180
> #5  0x0000000800b700a4 in pthread_getprio () from /lib/libthr.so.3
> #6  0x0000000000000000 in ?? ()
> Error accessing memory address 0x7fffffbfe000: Bad address.
>
>
> -Paul
>
>
> On Fri, Dec 20, 2013 at 2:59 PM, Paul Hargrove <phhargr...@lbl.gov> wrote:
>
>> This case is not quite like my OpenBSD-5 report.
>> On FreeBSD-9 I *can* run singletons, but "-np 2" hangs.
>>
>> The following hangs:
>> $ mpirun -np 2 examples/ring_c
>>
>> The following complains about the "bogus" btl selection.
>> So this is not the same as my problem with OpenBSD-5:
>> $ mpirun -mca btl bogus -np 2 examples/ring_c
>> [freebsd9-amd64.qemu:05926] mca: base: components_open: component pml /
>> bfo open function failed
>> [freebsd9-amd64.qemu:05926] mca: base: components_open: component pml /
>> ob1 open function failed
>> [freebsd9-amd64.qemu:05926] PML ob1 cannot be selected
>> --------------------------------------------------------------------------
>> A requested component was not found, or was unable to be opened.  This
>> means that this component is either not installed or is unable to be
>> used on your system (e.g., sometimes this means that shared libraries
>> that the component requires are unable to be found/loaded).  Note that
>> Open MPI stopped checking at the first component that it did not find.
>>
>> Host:      freebsd9-amd64.qemu
>> Framework: btl
>> Component: bogus
>> --------------------------------------------------------------------------
>> --------------------------------------------------------------------------
>> No available pml components were found!
>>
>> This means that there are no components of this type installed on your
>> system or all the components reported that they could not be used.
>>
>> This is a fatal error; your MPI process is likely to abort.  Check the
>> output of the "ompi_info" command and ensure that components of this
>> type are available on your system.  You may also wish to check the
>> value of the "component_path" MCA parameter and ensure that it has at
>> least one directory that contains valid MCA components.
>> --------------------------------------------------------------------------
>>
>>
>> For the non-bogus case, "top" show one idle and one active ring_c process:
>>   PID USERNAME  THR PRI NICE   SIZE    RES STATE   C   TIME   WCPU COMMAND
>>  5933 phargrov    2  29    0    98M  6384K select  1   0:32 100.00% ring_c
>>  5931 phargrov    2  20    0 77844K  4856K select  0   0:00  0.00% orterun
>>  5932 phargrov    2  24    0 51652K  4960K select  0   0:00  0.00% ring_c
>>
>> A backtrace for the 100%-cpu ring_c process:
>> (gdb) where
>> #0  0x0000000800d9811c in poll () from /lib/libc.so.7
>> #1  0x0000000800ae37fe in poll () from /lib/libthr.so.3
>> #2  0x00000008013259aa in poll_dispatch ()
>>    from
>> /home/phargrov/OMPI/openmpi-1.7-latest-freebsd9-amd64/INST/lib/libopen-pal.so.7
>> #3  0x000000080131eb50 in opal_libevent2021_event_base_loop ()
>>    from
>> /home/phargrov/OMPI/openmpi-1.7-latest-freebsd9-amd64/INST/lib/libopen-pal.so.7
>> #4  0x000000080106395d in orte_progress_thread_engine ()
>>    from
>> /home/phargrov/OMPI/openmpi-1.7-latest-freebsd9-amd64/INST/lib/libopen-rte.so.7
>> #5  0x0000000800ae10a4 in pthread_getprio () from /lib/libthr.so.3
>> #6  0x0000000000000000 in ?? ()
>> Error accessing memory address 0x7fffffbfe000: Bad address.
>>
>>
>> And for the idle ring_c process:
>> (gdb) where
>> #0  0x0000000800d9811c in poll () from /lib/libc.so.7
>> #1  0x0000000800ae37fe in poll () from /lib/libthr.so.3
>> #2  0x00000008013259aa in poll_dispatch ()
>>    from
>> /home/phargrov/OMPI/openmpi-1.7-latest-freebsd9-amd64/INST/lib/libopen-pal.so.7
>> #3  0x000000080131eb50 in opal_libevent2021_event_base_loop ()
>>    from
>> /home/phargrov/OMPI/openmpi-1.7-latest-freebsd9-amd64/INST/lib/libopen-pal.so.7
>> #4  0x000000080106395d in orte_progress_thread_engine ()
>>    from
>> /home/phargrov/OMPI/openmpi-1.7-latest-freebsd9-amd64/INST/lib/libopen-rte.so.7
>> #5  0x0000000800ae10a4 in pthread_getprio () from /lib/libthr.so.3
>> #6  0x0000000000000000 in ?? ()
>> Error accessing memory address 0x7fffffbfe000: Bad address.
>>
>>
>> They look to be the same, but I double checked that these are correct.
>>
>> -Paul
>>
>>
>> --
>> Paul H. Hargrove                          phhargr...@lbl.gov
>> Future Technologies Group
>> Computer and Data Sciences Department     Tel: +1-510-495-2352
>> Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900
>>
>
>
>
> --
> Paul H. Hargrove                          phhargr...@lbl.gov
> Future Technologies Group
> Computer and Data Sciences Department     Tel: +1-510-495-2352
> Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900
>



-- 
Paul H. Hargrove                          phhargr...@lbl.gov
Future Technologies Group
Computer and Data Sciences Department     Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900

Reply via email to