Here is the discussion - afraid it is fairly lengthy. Ignore the hwloc 
references in it as that was a separate issue:

http://www.open-mpi.org/community/lists/devel/2015/09/18074.php 
<http://www.open-mpi.org/community/lists/devel/2015/09/18074.php>

It definitely sounds like the same issue creeping in again. I’d appreciate any 
thoughts on how to correct it. If it helps, you could look at the PMIx master - 
there are standalone tests in the test/simple directory that fork/exec a child 
and just do the connection.

https://github.com/pmix/master <https://github.com/pmix/master>

The test server is simptest.c - it will spawn a single copy of simpclient.c by 
default.


> On Oct 27, 2015, at 10:14 PM, George Bosilca <bosi...@icl.utk.edu> wrote:
> 
> Interesting. Do you have a pointer to the commit (or/and to the discussion)?
> 
> I looked at the PMIX code, and I have identified few issues, but 
> unfortunately none of them seem to fix the problem for good. However, now I 
> need more than 1000 runs to get a deadlock (instead of few tens).
> 
> Looking with "netstat -ax" at the status of the UDS while the processes are 
> deadlocked, I see 2 UDS with the same name: one from the server which is in 
> LISTEN state, and one for the client which is being in CONNECTING state 
> (while the client already sent a message in the socket and is now waiting in 
> a blocking receive). This somehow suggest that the server has not yet called 
> accept on the UDS. Unfortunately, there are 3 threads all doing different 
> flavors of even_base and select, so I have a hard time tracking the path of 
> the UDS on the server side.
> 
> So in order to validate my assumption I wrote a minimalistic UDS client and 
> server application and tried different scenarios. The conclusion is that in 
> order to see the same type of output from "netstat -ax" I have to call listen 
> on the server, connect on the client and do not call accept on the server.
> 
> With the same occasion I also confirmed that the UDS are holding the data 
> sent so there is no need for further synchronization for the case where the 
> data is sent first. We only need to find out how the server forgets to call 
> accept.
> 
>   George.
> 
> 
> 
> On Tue, Oct 27, 2015 at 7:52 PM, Ralph Castain <r...@open-mpi.org 
> <mailto:r...@open-mpi.org>> wrote:
> Hmmm…this looks like it might be that problem we previously saw where the 
> blocking recv hangs in a proc when the blocking send tries to send before the 
> domain socket is actually ready, and so the send fails on the other end. As I 
> recall, it was something to do with the socketoptions - and then Paul had a 
> problem on some of his machines, and we backed it out?
> 
> I wonder if that’s what is biting us here again, and what we need is to 
> either remove the blocking send/recv’s altogether, or figure out a way to 
> wait until the socket is really ready.
> 
> Any thoughts?
> 
> 
>> On Oct 27, 2015, at 4:11 PM, George Bosilca <bosi...@icl.utk.edu 
>> <mailto:bosi...@icl.utk.edu>> wrote:
>> 
>> It appear the branch solve the problem at least partially. I asked one of my 
>> students to hammer it pretty badly, and he reported that the deadlocks still 
>> occur. He also graciously provided some stacktraces:
>> 
>> #0  0x00007f4bd5274aed in nanosleep () from /lib64/libc.so.6
>> #1  0x00007f4bd52a9c94 in usleep () from /lib64/libc.so.6
>> #2  0x00007f4bd2e42b00 in OPAL_PMIX_PMIX1XX_PMIx_Fence (procs=0x0, nprocs=0, 
>> info=0x7fff3c561960, 
>>     ninfo=1) at src/client/pmix_client_fence.c:100
>> #3  0x00007f4bd306e6d2 in pmix1_fence (procs=0x0, collect_data=1) at 
>> pmix1_client.c:306
>> #4  0x00007f4bd57d5cc3 in ompi_mpi_init (argc=3, argv=0x7fff3c561ea8, 
>> requested=3, 
>>     provided=0x7fff3c561d84) at runtime/ompi_mpi_init.c:644
>> #5  0x00007f4bd5813399 in PMPI_Init_thread (argc=0x7fff3c561d7c, 
>> argv=0x7fff3c561d70, required=3, 
>>     provided=0x7fff3c561d84) at pinit_thread.c:69
>> #6  0x0000000000401516 in main (argc=3, argv=0x7fff3c561ea8) at 
>> osu_mbw_mr.c:86
>> 
>> And another process:
>> 
>> #0  0x00007f7b9d7d8bdc in recv () from /lib64/libpthread.so.0
>> #1  0x00007f7b9b0aa42d in opal_pmix_pmix1xx_pmix_usock_recv_blocking (sd=13, 
>> data=0x7ffd62139004 "", 
>>     size=4) at src/usock/usock.c:168
>> #2  0x00007f7b9b0af5d9 in recv_connect_ack (sd=13) at 
>> src/client/pmix_client.c:844
>> #3  0x00007f7b9b0b085e in usock_connect (addr=0x7ffd62139330) at 
>> src/client/pmix_client.c:1110
>> #4  0x00007f7b9b0acc24 in connect_to_server (address=0x7ffd62139330, 
>> cbdata=0x7ffd621390e0)
>>     at src/client/pmix_client.c:181
>> #5  0x00007f7b9b0ad569 in OPAL_PMIX_PMIX1XX_PMIx_Init (proc=0x7f7b9b4e9b60)
>>     at src/client/pmix_client.c:362
>> #6  0x00007f7b9b2dbd9d in pmix1_client_init () at pmix1_client.c:99
>> #7  0x00007f7b9b4eb95f in pmi_component_query (module=0x7ffd62139490, 
>> priority=0x7ffd6213948c)
>>     at ess_pmi_component.c:90
>> #8  0x00007f7b9ce70ec5 in mca_base_select (type_name=0x7f7b9d20e059 "ess", 
>> output_id=-1, 
>>     components_available=0x7f7b9d431eb0, best_module=0x7ffd621394d0, 
>> best_component=0x7ffd621394d8, 
>>     priority_out=0x0) at mca_base_components_select.c:77
>> #9  0x00007f7b9d1a956b in orte_ess_base_select () at 
>> base/ess_base_select.c:40
>> #10 0x00007f7b9d160449 in orte_init (pargc=0x0, pargv=0x0, flags=32) at 
>> runtime/orte_init.c:219
>> #11 0x00007f7b9da4377a in ompi_mpi_init (argc=3, argv=0x7ffd621397f8, 
>> requested=3, 
>>     provided=0x7ffd621396d4) at runtime/ompi_mpi_init.c:488
>> #12 0x00007f7b9da81399 in PMPI_Init_thread (argc=0x7ffd621396cc, 
>> argv=0x7ffd621396c0, required=3, 
>>     provided=0x7ffd621396d4) at pinit_thread.c:69
>> #13 0x0000000000401516 in main (argc=3, argv=0x7ffd621397f8) at 
>> osu_mbw_mr.c:86
>> 
>>   George.
>> 
>> 
>> 
>> On Tue, Oct 27, 2015 at 2:36 PM, Ralph Castain <r...@open-mpi.org 
>> <mailto:r...@open-mpi.org>> wrote:
>> I haven’t been able to replicate this when using the branch in this PR:
>> 
>> https://github.com/open-mpi/ompi/pull/1073 
>> <https://github.com/open-mpi/ompi/pull/1073>
>> 
>> Would you mind giving it a try? It fixes some other race conditions and 
>> might pick this one up too.
>> 
>> 
>>> On Oct 27, 2015, at 10:04 AM, Ralph Castain <r...@open-mpi.org 
>>> <mailto:r...@open-mpi.org>> wrote:
>>> 
>>> Okay, I’ll take a look - I’ve been chasing a race condition that might be 
>>> related
>>> 
>>>> On Oct 27, 2015, at 9:54 AM, George Bosilca <bosi...@icl.utk.edu 
>>>> <mailto:bosi...@icl.utk.edu>> wrote:
>>>> 
>>>> No, it's using 2 nodes.
>>>>   George.
>>>> 
>>>> 
>>>> On Tue, Oct 27, 2015 at 12:35 PM, Ralph Castain <r...@open-mpi.org 
>>>> <mailto:r...@open-mpi.org>> wrote:
>>>> Is this on a single node?
>>>> 
>>>>> On Oct 27, 2015, at 9:25 AM, George Bosilca <bosi...@icl.utk.edu 
>>>>> <mailto:bosi...@icl.utk.edu>> wrote:
>>>>> 
>>>>> I get intermittent deadlocks wit the latest trunk. The smallest 
>>>>> reproducer is a shell for loop around a small (2 processes) short (20 
>>>>> seconds) MPI application. After few tens of iterations the MPI_Init will 
>>>>> deadlock with the following backtrace:
>>>>> 
>>>>> #0  0x00007fa94b5d9aed in nanosleep () from /lib64/libc.so.6
>>>>> #1  0x00007fa94b60ec94 in usleep () from /lib64/libc.so.6
>>>>> #2  0x00007fa94960ba08 in OPAL_PMIX_PMIX1XX_PMIx_Fence (procs=0x0, 
>>>>> nprocs=0, info=0x7ffd7934fb90, 
>>>>>     ninfo=1) at src/client/pmix_client_fence.c:100
>>>>> #3  0x00007fa9498376a2 in pmix1_fence (procs=0x0, collect_data=1) at 
>>>>> pmix1_client.c:305
>>>>> #4  0x00007fa94bb39ba4 in ompi_mpi_init (argc=3, argv=0x7ffd793500a8, 
>>>>> requested=3, 
>>>>>     provided=0x7ffd7934ff94) at runtime/ompi_mpi_init.c:645
>>>>> #5  0x00007fa94bb77281 in PMPI_Init_thread (argc=0x7ffd7934ff8c, 
>>>>> argv=0x7ffd7934ff80, required=3, 
>>>>>     provided=0x7ffd7934ff94) at pinit_thread.c:69
>>>>> #6  0x000000000040150f in main (argc=3, argv=0x7ffd793500a8) at 
>>>>> osu_mbw_mr.c:86
>>>>> 
>>>>> On my machines this is reproducible at 100% after anywhere between 50 and 
>>>>> 100 iterations.
>>>>> 
>>>>>   Thanks,
>>>>>     George.
>>>>> 
>>>>> _______________________________________________
>>>>> devel mailing list
>>>>> de...@open-mpi.org <mailto:de...@open-mpi.org>
>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel 
>>>>> <http://www.open-mpi.org/mailman/listinfo.cgi/devel>
>>>>> Link to this post: 
>>>>> http://www.open-mpi.org/community/lists/devel/2015/10/18280.php 
>>>>> <http://www.open-mpi.org/community/lists/devel/2015/10/18280.php>
>>>> 
>>>> _______________________________________________
>>>> devel mailing list
>>>> de...@open-mpi.org <mailto:de...@open-mpi.org>
>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel 
>>>> <http://www.open-mpi.org/mailman/listinfo.cgi/devel>
>>>> Link to this post: 
>>>> http://www.open-mpi.org/community/lists/devel/2015/10/18281.php 
>>>> <http://www.open-mpi.org/community/lists/devel/2015/10/18281.php>
>>>> 
>>>> _______________________________________________
>>>> devel mailing list
>>>> de...@open-mpi.org <mailto:de...@open-mpi.org>
>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel 
>>>> <http://www.open-mpi.org/mailman/listinfo.cgi/devel>
>>>> Link to this post: 
>>>> http://www.open-mpi.org/community/lists/devel/2015/10/18282.php 
>>>> <http://www.open-mpi.org/community/lists/devel/2015/10/18282.php>
>> 
>> 
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org <mailto:de...@open-mpi.org>
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel 
>> <http://www.open-mpi.org/mailman/listinfo.cgi/devel>
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2015/10/18284.php 
>> <http://www.open-mpi.org/community/lists/devel/2015/10/18284.php>
>> 
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org <mailto:de...@open-mpi.org>
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel 
>> <http://www.open-mpi.org/mailman/listinfo.cgi/devel>
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2015/10/18292.php 
>> <http://www.open-mpi.org/community/lists/devel/2015/10/18292.php>
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org <mailto:de...@open-mpi.org>
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel 
> <http://www.open-mpi.org/mailman/listinfo.cgi/devel>
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2015/10/18294.php 
> <http://www.open-mpi.org/community/lists/devel/2015/10/18294.php>
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2015/10/18302.php

Reply via email to