Re: [OMPI devel] Open MPI v4.0.1: Process is hanging inside MPI_Init() when debugged with TotalView

Ralph Castain via devel Tue, 12 Nov 2019 06:24:57 -0800


> On Nov 11, 2019, at 4:53 PM, Gilles Gouaillardet via devel 
> <devel@lists.open-mpi.org> wrote:
> 
> John,
> 
> OMPI_LAZY_WAIT_FOR_COMPLETION(active)
> 
> 
> is a simple loop that periodically checks the (volatile) "active" condition, 
> that is expected to be updated by an other thread.
> So if you set your breakpoint too early, and **all** threads are stopped when 
> this breakpoint is hit, you might experience
> what looks like a race condition.
> I guess a similar scenario can occur if the breakpoint is set in mpirun/orted 
> too early, and prevents the pmix (or oob/tcp) thread
> from sending the message to all MPI tasks)
> 
> 
> 
> Ralph,
> 
> does the v4.0.x branch still need the oob/tcp progress thread running inside 
> the MPI app?
> or are we missing some commits (since all interactions with mpirun/orted are 
> handled by PMIx, at least in the master branch) ?


IIRC, that progress thread only runs if explicitly asked to do so by MCA param. 
We don't need that code any more as PMIx takes care of it.

> 
> Cheers,
> 
> Gilles
> 
> On 11/12/2019 9:27 AM, Ralph Castain via devel wrote:
>> Hi John
>> 
>> Sorry to say, but there is no way to really answer your question as the OMPI 
>> community doesn't actively test MPIR support. I haven't seen any reports of 
>> hangs during MPI_Init from any release series, including 4.x. My guess is 
>> that it may have something to do with the debugger interactions as opposed 
>> to being a true race condition.
>> 
>> Ralph
>> 
>> 
>>> On Nov 8, 2019, at 11:27 AM, John DelSignore via devel 
>>> <devel@lists.open-mpi.org <mailto:devel@lists.open-mpi.org>> wrote:
>>> 
>>> Hi,
>>> 
>>> An LLNL TotalView user on a Mac reported that their MPI job was hanging 
>>> inside MPI_Init() when started under the control of TotalView. They were 
>>> using Open MPI 4.0.1, and TotalView was using the MPIR Interface (sorry, we 
>>> don't support the PMIx debugging hooks yet).
>>> 
>>> I was able to reproduce the hang on my own Linux system with my own build 
>>> of Open MPI 4.0.1, which I built with debug symbols. As far as I can tell, 
>>> there is some sort of race inside of Open MPI 4.0.1, because if I placed 
>>> breakpoints at certain points in the Open MPI code, and thus change the 
>>> timing slightly, that was enough to avoid the hang.
>>> 
>>> When the code hangs, it appeared as if one or more MPI processes are 
>>> waiting inside ompi_mpi_init() at line ompi_mpi_init.c#904 for a fence to 
>>> be released. In one of the runs, rank 0 was the only one the was hanging 
>>> there (though I have seen runs where two ranks were hung there).
>>> 
>>> Here's a backtrace of the first thread in the rank 0 process in the case 
>>> where one rank was hung:
>>> 
>>> d1.<> f 10.1 w
>>> >  0 __nanosleep_nocancel PC=0x7ffff74e2efd, FP=0x7fffffffd1e0 
>>> > [/lib64/libc.so.6]
>>>    1 usleep PC=0x7ffff7513b2f, FP=0x7fffffffd200 [/lib64/libc.so.6]
>>>    2 ompi_mpi_init PC=0x7ffff7a64009, FP=0x7fffffffd350 
>>> [/home/jdelsign/src/tools-external/openmpi-4.0.1/ompi/runtime/ompi_mpi_init.c#904]
>>>    3 PMPI_Init PC=0x7ffff7ab0be4, FP=0x7fffffffd390 
>>> [/home/jdelsign/src/tools-external/openmpi-4.0.1-lid/ompi/mpi/c/profile/pinit.c#67]
>>>    4 main             PC=0x00400c5e, FP=0x7fffffffd550 
>>> [/home/jdelsign/cpi.c#27]
>>>    5 __libc_start_main PC=0x7ffff7446b13, FP=0x7fffffffd610 
>>> [/lib64/libc.so.6]
>>>    6 _start           PC=0x00400b04, FP=0x7fffffffd618 
>>> [/amd/home/jdelsign/cpi]
>>> 
>>> Here's the block of code where the thread is hung:
>>> 
>>>     /* if we executed the above fence in the background, then
>>>      * we have to wait here for it to complete. However, there
>>>      * is no reason to do two barriers! */
>>>     if (background_fence) {
>>> OMPI_LAZY_WAIT_FOR_COMPLETION(active);
>>>     } else if (!ompi_async_mpi_init) {
>>>         /* wait for everyone to reach this point - this is a hard
>>>          * barrier requirement at this time, though we hope to relax
>>>          * it at a later point */
>>>         if (NULL != opal_pmix.fence_nb) {
>>>             active = true;
>>> OPAL_POST_OBJECT(&active);
>>>             if (OMPI_SUCCESS != (ret = opal_pmix.fence_nb(NULL, false,
>>> fence_release, (void*)&active))) {
>>>                 error = "opal_pmix.fence_nb() failed";
>>>                 goto error;
>>>             }
>>> OMPI_LAZY_WAIT_FOR_COMPLETION(active); *<<<<----- STUCK HERE WAITING FOR 
>>> THE FENCE TO BE RELEASED*
>>>         } else {
>>>             if (OMPI_SUCCESS != (ret = opal_pmix.fence(NULL, false))) {
>>>                 error = "opal_pmix.fence() failed";
>>>                 goto error;
>>>             }
>>>         }
>>>     }
>>> 
>>> And here is an aggregated backtrace of all of the processes and threads in 
>>> the job:
>>> 
>>> d1.<> f g w -g f+l
>>> +/
>>>  +__clone : 5:12[0-3.2-3, p1.2-5]
>>>  |+start_thread
>>>  | +listen_thread@oob_tcp_listener.c 
>>> <mailto:listen_thread@oob_tcp_listener.c>#705 : 1:1[p1.5]
>>>  | |+__select_nocancel
>>>  | +listen_thread@ptl_base_listener.c 
>>> <mailto:listen_thread@ptl_base_listener.c>#214 : 1:1[p1.3]
>>>  | |+__select_nocancel
>>>  | +progress_engine@opal_progress_threads.c 
>>> <mailto:progress_engine@opal_progress_threads.c>#105 : 5:5[0-3.2, p1.4]
>>>  | |+opal_libevent2022_event_base_loop@event.c 
>>> <mailto:opal_libevent2022_event_base_loop@event.c>#1632
>>>  | | +poll_dispatch@poll.c <mailto:poll_dispatch@poll.c>#167
>>>  | |  +__poll_nocancel
>>>  | +progress_engine@pmix_progress_threads.c 
>>> <mailto:progress_engine@pmix_progress_threads.c>#108 : 5:5[0-3.3, p1.2]
>>>  |  +opal_libevent2022_event_base_loop@event.c 
>>> <mailto:opal_libevent2022_event_base_loop@event.c>#1632
>>>  |   +epoll_dispatch@epoll.c <mailto:epoll_dispatch@epoll.c>#409
>>>  |    +__epoll_wait_nocancel
>>>  +_start : 5:5[0-3.1, p1.1]
>>>   +__libc_start_main
>>>    +main@cpi.c <mailto:main@cpi.c>#27 : 4:4[0-3.1]
>>>    |+PMPI_Init@pinit.c <mailto:PMPI_Init@pinit.c>#67
>>>    | +*ompi_mpi_init@ompi_mpi_init.c#890 : 3:3[1-3.1]**<<<<---- THE 3 OTHER 
>>> MPI PROCS MADE IT PAST FENCE*
>>>    | |+ompi_rte_wait_for_debugger@rte_orte_module.c 
>>> <mailto:ompi_rte_wait_for_debugger@rte_orte_module.c>#196
>>>    | | +opal_progress@opal_progress.c 
>>> <mailto:opal_progress@opal_progress.c>#251
>>>    | |  +opal_progress_events@opal_progress.c 
>>> <mailto:opal_progress_events@opal_progress.c>#191
>>>    | |   +opal_libevent2022_event_base_loop@event.c 
>>> <mailto:opal_libevent2022_event_base_loop@event.c>#1632
>>>    | |    +poll_dispatch@poll.c <mailto:poll_dispatch@poll.c>#167
>>>    | |     +__poll_nocancel
>>>    | +*ompi_mpi_init@ompi_mpi_init.c#904 : 1:1[0.1]**<<<<----**THE THREAD 
>>> THAT IS STUCK*
>>>    |  +usleep
>>>    |   +__nanosleep_nocancel
>>>    +main@main.c <mailto:main@main.c>#14 : 1:1[p1.1]
>>>     +orterun@orterun.c <mailto:orterun@orterun.c>#200
>>>      +opal_libevent2022_event_base_loop@event.c 
>>> <mailto:opal_libevent2022_event_base_loop@event.c>#1632
>>>       +poll_dispatch@poll.c <mailto:poll_dispatch@poll.c>#167
>>>        +__poll_nocancel
>>> 
>>> d1.<>
>>> 
>>> I have tested Open MPI 4.0.2 dozens of times, and the hang does not seem to 
>>> happen. My concern is that if the problem is indeed a race, then it's 
>>> /possible/ (but perhaps not likely) that the same race exists in Open MPI 
>>> 4.0.2, but the timing could be slightly different such that it doesn't hang 
>>> using my simple test setup. In other words, maybe I've just been "lucky" 
>>> with my testing of Open MPI 4.0.2 and have failed to provoke the hang yet.
>>> 
>>> My question is: Was this a known problem in Open MPI 4.0.1 that was fixed 
>>> in Open MPI 4.0.2?
>>> 
>>> Thanks, John D.
>>> 
>>> 
>>

Re: [OMPI devel] Open MPI v4.0.1: Process is hanging inside MPI_Init() when debugged with TotalView

Reply via email to