Here is the discussion - afraid it is fairly lengthy. Ignore the hwloc references in it as that was a separate issue:
http://www.open-mpi.org/community/lists/devel/2015/09/18074.php <http://www.open-mpi.org/community/lists/devel/2015/09/18074.php> It definitely sounds like the same issue creeping in again. I’d appreciate any thoughts on how to correct it. If it helps, you could look at the PMIx master - there are standalone tests in the test/simple directory that fork/exec a child and just do the connection. https://github.com/pmix/master <https://github.com/pmix/master> The test server is simptest.c - it will spawn a single copy of simpclient.c by default. > On Oct 27, 2015, at 10:14 PM, George Bosilca <bosi...@icl.utk.edu> wrote: > > Interesting. Do you have a pointer to the commit (or/and to the discussion)? > > I looked at the PMIX code, and I have identified few issues, but > unfortunately none of them seem to fix the problem for good. However, now I > need more than 1000 runs to get a deadlock (instead of few tens). > > Looking with "netstat -ax" at the status of the UDS while the processes are > deadlocked, I see 2 UDS with the same name: one from the server which is in > LISTEN state, and one for the client which is being in CONNECTING state > (while the client already sent a message in the socket and is now waiting in > a blocking receive). This somehow suggest that the server has not yet called > accept on the UDS. Unfortunately, there are 3 threads all doing different > flavors of even_base and select, so I have a hard time tracking the path of > the UDS on the server side. > > So in order to validate my assumption I wrote a minimalistic UDS client and > server application and tried different scenarios. The conclusion is that in > order to see the same type of output from "netstat -ax" I have to call listen > on the server, connect on the client and do not call accept on the server. > > With the same occasion I also confirmed that the UDS are holding the data > sent so there is no need for further synchronization for the case where the > data is sent first. We only need to find out how the server forgets to call > accept. > > George. > > > > On Tue, Oct 27, 2015 at 7:52 PM, Ralph Castain <r...@open-mpi.org > <mailto:r...@open-mpi.org>> wrote: > Hmmm…this looks like it might be that problem we previously saw where the > blocking recv hangs in a proc when the blocking send tries to send before the > domain socket is actually ready, and so the send fails on the other end. As I > recall, it was something to do with the socketoptions - and then Paul had a > problem on some of his machines, and we backed it out? > > I wonder if that’s what is biting us here again, and what we need is to > either remove the blocking send/recv’s altogether, or figure out a way to > wait until the socket is really ready. > > Any thoughts? > > >> On Oct 27, 2015, at 4:11 PM, George Bosilca <bosi...@icl.utk.edu >> <mailto:bosi...@icl.utk.edu>> wrote: >> >> It appear the branch solve the problem at least partially. I asked one of my >> students to hammer it pretty badly, and he reported that the deadlocks still >> occur. He also graciously provided some stacktraces: >> >> #0 0x00007f4bd5274aed in nanosleep () from /lib64/libc.so.6 >> #1 0x00007f4bd52a9c94 in usleep () from /lib64/libc.so.6 >> #2 0x00007f4bd2e42b00 in OPAL_PMIX_PMIX1XX_PMIx_Fence (procs=0x0, nprocs=0, >> info=0x7fff3c561960, >> ninfo=1) at src/client/pmix_client_fence.c:100 >> #3 0x00007f4bd306e6d2 in pmix1_fence (procs=0x0, collect_data=1) at >> pmix1_client.c:306 >> #4 0x00007f4bd57d5cc3 in ompi_mpi_init (argc=3, argv=0x7fff3c561ea8, >> requested=3, >> provided=0x7fff3c561d84) at runtime/ompi_mpi_init.c:644 >> #5 0x00007f4bd5813399 in PMPI_Init_thread (argc=0x7fff3c561d7c, >> argv=0x7fff3c561d70, required=3, >> provided=0x7fff3c561d84) at pinit_thread.c:69 >> #6 0x0000000000401516 in main (argc=3, argv=0x7fff3c561ea8) at >> osu_mbw_mr.c:86 >> >> And another process: >> >> #0 0x00007f7b9d7d8bdc in recv () from /lib64/libpthread.so.0 >> #1 0x00007f7b9b0aa42d in opal_pmix_pmix1xx_pmix_usock_recv_blocking (sd=13, >> data=0x7ffd62139004 "", >> size=4) at src/usock/usock.c:168 >> #2 0x00007f7b9b0af5d9 in recv_connect_ack (sd=13) at >> src/client/pmix_client.c:844 >> #3 0x00007f7b9b0b085e in usock_connect (addr=0x7ffd62139330) at >> src/client/pmix_client.c:1110 >> #4 0x00007f7b9b0acc24 in connect_to_server (address=0x7ffd62139330, >> cbdata=0x7ffd621390e0) >> at src/client/pmix_client.c:181 >> #5 0x00007f7b9b0ad569 in OPAL_PMIX_PMIX1XX_PMIx_Init (proc=0x7f7b9b4e9b60) >> at src/client/pmix_client.c:362 >> #6 0x00007f7b9b2dbd9d in pmix1_client_init () at pmix1_client.c:99 >> #7 0x00007f7b9b4eb95f in pmi_component_query (module=0x7ffd62139490, >> priority=0x7ffd6213948c) >> at ess_pmi_component.c:90 >> #8 0x00007f7b9ce70ec5 in mca_base_select (type_name=0x7f7b9d20e059 "ess", >> output_id=-1, >> components_available=0x7f7b9d431eb0, best_module=0x7ffd621394d0, >> best_component=0x7ffd621394d8, >> priority_out=0x0) at mca_base_components_select.c:77 >> #9 0x00007f7b9d1a956b in orte_ess_base_select () at >> base/ess_base_select.c:40 >> #10 0x00007f7b9d160449 in orte_init (pargc=0x0, pargv=0x0, flags=32) at >> runtime/orte_init.c:219 >> #11 0x00007f7b9da4377a in ompi_mpi_init (argc=3, argv=0x7ffd621397f8, >> requested=3, >> provided=0x7ffd621396d4) at runtime/ompi_mpi_init.c:488 >> #12 0x00007f7b9da81399 in PMPI_Init_thread (argc=0x7ffd621396cc, >> argv=0x7ffd621396c0, required=3, >> provided=0x7ffd621396d4) at pinit_thread.c:69 >> #13 0x0000000000401516 in main (argc=3, argv=0x7ffd621397f8) at >> osu_mbw_mr.c:86 >> >> George. >> >> >> >> On Tue, Oct 27, 2015 at 2:36 PM, Ralph Castain <r...@open-mpi.org >> <mailto:r...@open-mpi.org>> wrote: >> I haven’t been able to replicate this when using the branch in this PR: >> >> https://github.com/open-mpi/ompi/pull/1073 >> <https://github.com/open-mpi/ompi/pull/1073> >> >> Would you mind giving it a try? It fixes some other race conditions and >> might pick this one up too. >> >> >>> On Oct 27, 2015, at 10:04 AM, Ralph Castain <r...@open-mpi.org >>> <mailto:r...@open-mpi.org>> wrote: >>> >>> Okay, I’ll take a look - I’ve been chasing a race condition that might be >>> related >>> >>>> On Oct 27, 2015, at 9:54 AM, George Bosilca <bosi...@icl.utk.edu >>>> <mailto:bosi...@icl.utk.edu>> wrote: >>>> >>>> No, it's using 2 nodes. >>>> George. >>>> >>>> >>>> On Tue, Oct 27, 2015 at 12:35 PM, Ralph Castain <r...@open-mpi.org >>>> <mailto:r...@open-mpi.org>> wrote: >>>> Is this on a single node? >>>> >>>>> On Oct 27, 2015, at 9:25 AM, George Bosilca <bosi...@icl.utk.edu >>>>> <mailto:bosi...@icl.utk.edu>> wrote: >>>>> >>>>> I get intermittent deadlocks wit the latest trunk. The smallest >>>>> reproducer is a shell for loop around a small (2 processes) short (20 >>>>> seconds) MPI application. After few tens of iterations the MPI_Init will >>>>> deadlock with the following backtrace: >>>>> >>>>> #0 0x00007fa94b5d9aed in nanosleep () from /lib64/libc.so.6 >>>>> #1 0x00007fa94b60ec94 in usleep () from /lib64/libc.so.6 >>>>> #2 0x00007fa94960ba08 in OPAL_PMIX_PMIX1XX_PMIx_Fence (procs=0x0, >>>>> nprocs=0, info=0x7ffd7934fb90, >>>>> ninfo=1) at src/client/pmix_client_fence.c:100 >>>>> #3 0x00007fa9498376a2 in pmix1_fence (procs=0x0, collect_data=1) at >>>>> pmix1_client.c:305 >>>>> #4 0x00007fa94bb39ba4 in ompi_mpi_init (argc=3, argv=0x7ffd793500a8, >>>>> requested=3, >>>>> provided=0x7ffd7934ff94) at runtime/ompi_mpi_init.c:645 >>>>> #5 0x00007fa94bb77281 in PMPI_Init_thread (argc=0x7ffd7934ff8c, >>>>> argv=0x7ffd7934ff80, required=3, >>>>> provided=0x7ffd7934ff94) at pinit_thread.c:69 >>>>> #6 0x000000000040150f in main (argc=3, argv=0x7ffd793500a8) at >>>>> osu_mbw_mr.c:86 >>>>> >>>>> On my machines this is reproducible at 100% after anywhere between 50 and >>>>> 100 iterations. >>>>> >>>>> Thanks, >>>>> George. >>>>> >>>>> _______________________________________________ >>>>> devel mailing list >>>>> de...@open-mpi.org <mailto:de...@open-mpi.org> >>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>> <http://www.open-mpi.org/mailman/listinfo.cgi/devel> >>>>> Link to this post: >>>>> http://www.open-mpi.org/community/lists/devel/2015/10/18280.php >>>>> <http://www.open-mpi.org/community/lists/devel/2015/10/18280.php> >>>> >>>> _______________________________________________ >>>> devel mailing list >>>> de...@open-mpi.org <mailto:de...@open-mpi.org> >>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>> <http://www.open-mpi.org/mailman/listinfo.cgi/devel> >>>> Link to this post: >>>> http://www.open-mpi.org/community/lists/devel/2015/10/18281.php >>>> <http://www.open-mpi.org/community/lists/devel/2015/10/18281.php> >>>> >>>> _______________________________________________ >>>> devel mailing list >>>> de...@open-mpi.org <mailto:de...@open-mpi.org> >>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>> <http://www.open-mpi.org/mailman/listinfo.cgi/devel> >>>> Link to this post: >>>> http://www.open-mpi.org/community/lists/devel/2015/10/18282.php >>>> <http://www.open-mpi.org/community/lists/devel/2015/10/18282.php> >> >> >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org <mailto:de...@open-mpi.org> >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >> <http://www.open-mpi.org/mailman/listinfo.cgi/devel> >> Link to this post: >> http://www.open-mpi.org/community/lists/devel/2015/10/18284.php >> <http://www.open-mpi.org/community/lists/devel/2015/10/18284.php> >> >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org <mailto:de...@open-mpi.org> >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >> <http://www.open-mpi.org/mailman/listinfo.cgi/devel> >> Link to this post: >> http://www.open-mpi.org/community/lists/devel/2015/10/18292.php >> <http://www.open-mpi.org/community/lists/devel/2015/10/18292.php> > > _______________________________________________ > devel mailing list > de...@open-mpi.org <mailto:de...@open-mpi.org> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > <http://www.open-mpi.org/mailman/listinfo.cgi/devel> > Link to this post: > http://www.open-mpi.org/community/lists/devel/2015/10/18294.php > <http://www.open-mpi.org/community/lists/devel/2015/10/18294.php> > > _______________________________________________ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2015/10/18302.php