Should have also clarified: the prior fixes are indeed in the current master.

> On Oct 28, 2015, at 12:42 AM, Ralph Castain <r...@open-mpi.org> wrote:
> 
> Nope - I was wrong. The correction on the client side consisted of attempting 
> to timeout if the blocking recv failed. We then modified the blocking 
> send/recv so they would handle errors.
> 
> So that problem occurred -after- the server had correctly called accept. The 
> listener code is in 
> opal/mca/pmix/pmix1xx/pmix/src/server/pmix_server_listener.c
> 
> It looks to me like the only way we could drop the accept (assuming the OS 
> doesn’t lose it) is if the file descriptor lies outside the expected range 
> once we fall out of select:
> 
> 
>         /* Spin accepting connections until all active listen sockets
>          * do not have any incoming connections, pushing each connection
>          * onto the event queue for processing
>          */
>         do {
>             accepted_connections = 0;
>             /* according to the man pages, select replaces the given 
> descriptor
>              * set with a subset consisting of those descriptors that are 
> ready
>              * for the specified operation - in this case, a read. So we need 
> to
>              * first check to see if this file descriptor is included in the
>              * returned subset
>              */
>             if (0 == FD_ISSET(pmix_server_globals.listen_socket, &readfds)) {
>                 /* this descriptor is not included */
>                 continue;
>             }
> 
>             /* this descriptor is ready to be read, which means a connection
>              * request has been received - so harvest it. All we want to do
>              * here is accept the connection and push the info onto the event
>              * library for subsequent processing - we don't want to actually
>              * process the connection here as it takes too long, and so the
>              * OS might start rejecting connections due to timeout.
>              */
>             pending_connection = PMIX_NEW(pmix_pending_connection_t);
>             event_assign(&pending_connection->ev, pmix_globals.evbase, -1,
>                          EV_WRITE, connection_handler, pending_connection);
>             pending_connection->sd = accept(pmix_server_globals.listen_socket,
>                                             (struct 
> sockaddr*)&(pending_connection->addr),
>                                             &addrlen);
>             if (pending_connection->sd < 0) {
>                 PMIX_RELEASE(pending_connection);
>                 if (pmix_socket_errno != EAGAIN ||
>                     pmix_socket_errno != EWOULDBLOCK) {
>                     if (EMFILE == pmix_socket_errno) {
>                         PMIX_ERROR_LOG(PMIX_ERR_OUT_OF_RESOURCE);
>                     } else {
>                         pmix_output(0, "listen_thread: accept() failed: %s 
> (%d).",
>                                     strerror(pmix_socket_errno), 
> pmix_socket_errno);
>                     }
>                     goto done;
>                 }
>                 continue;
>             }
> 
>             pmix_output_verbose(8, pmix_globals.debug_output,
>                                 "listen_thread: new connection: (%d, %d)",
>                                 pending_connection->sd, pmix_socket_errno);
>             /* activate the event */
>             event_active(&pending_connection->ev, EV_WRITE, 1);
>             accepted_connections++;
>         } while (accepted_connections > 0);
> 
> 
>> On Oct 28, 2015, at 12:25 AM, Ralph Castain <r...@open-mpi.org 
>> <mailto:r...@open-mpi.org>> wrote:
>> 
>> Looking at the code, it appears that a fix was committed for this problem, 
>> and that we correctly resolved the issue found by Paul. The problem is that 
>> the fix didn’t get upstreamed, and so it was lost the next time we refreshed 
>> PMIx. Sigh.
>> 
>> Let me try to recreate the fix and have you take a gander at it.
>> 
>> 
>>> On Oct 28, 2015, at 12:22 AM, Ralph Castain <r...@open-mpi.org 
>>> <mailto:r...@open-mpi.org>> wrote:
>>> 
>>> Here is the discussion - afraid it is fairly lengthy. Ignore the hwloc 
>>> references in it as that was a separate issue:
>>> 
>>> http://www.open-mpi.org/community/lists/devel/2015/09/18074.php 
>>> <http://www.open-mpi.org/community/lists/devel/2015/09/18074.php>
>>> 
>>> It definitely sounds like the same issue creeping in again. I’d appreciate 
>>> any thoughts on how to correct it. If it helps, you could look at the PMIx 
>>> master - there are standalone tests in the test/simple directory that 
>>> fork/exec a child and just do the connection.
>>> 
>>> https://github.com/pmix/master <https://github.com/pmix/master>
>>> 
>>> The test server is simptest.c - it will spawn a single copy of simpclient.c 
>>> by default.
>>> 
>>> 
>>>> On Oct 27, 2015, at 10:14 PM, George Bosilca <bosi...@icl.utk.edu 
>>>> <mailto:bosi...@icl.utk.edu>> wrote:
>>>> 
>>>> Interesting. Do you have a pointer to the commit (or/and to the 
>>>> discussion)?
>>>> 
>>>> I looked at the PMIX code, and I have identified few issues, but 
>>>> unfortunately none of them seem to fix the problem for good. However, now 
>>>> I need more than 1000 runs to get a deadlock (instead of few tens).
>>>> 
>>>> Looking with "netstat -ax" at the status of the UDS while the processes 
>>>> are deadlocked, I see 2 UDS with the same name: one from the server which 
>>>> is in LISTEN state, and one for the client which is being in CONNECTING 
>>>> state (while the client already sent a message in the socket and is now 
>>>> waiting in a blocking receive). This somehow suggest that the server has 
>>>> not yet called accept on the UDS. Unfortunately, there are 3 threads all 
>>>> doing different flavors of even_base and select, so I have a hard time 
>>>> tracking the path of the UDS on the server side.
>>>> 
>>>> So in order to validate my assumption I wrote a minimalistic UDS client 
>>>> and server application and tried different scenarios. The conclusion is 
>>>> that in order to see the same type of output from "netstat -ax" I have to 
>>>> call listen on the server, connect on the client and do not call accept on 
>>>> the server.
>>>> 
>>>> With the same occasion I also confirmed that the UDS are holding the data 
>>>> sent so there is no need for further synchronization for the case where 
>>>> the data is sent first. We only need to find out how the server forgets to 
>>>> call accept.
>>>> 
>>>>   George.
>>>> 
>>>> 
>>>> 
>>>> On Tue, Oct 27, 2015 at 7:52 PM, Ralph Castain <r...@open-mpi.org 
>>>> <mailto:r...@open-mpi.org>> wrote:
>>>> Hmmm…this looks like it might be that problem we previously saw where the 
>>>> blocking recv hangs in a proc when the blocking send tries to send before 
>>>> the domain socket is actually ready, and so the send fails on the other 
>>>> end. As I recall, it was something to do with the socketoptions - and then 
>>>> Paul had a problem on some of his machines, and we backed it out?
>>>> 
>>>> I wonder if that’s what is biting us here again, and what we need is to 
>>>> either remove the blocking send/recv’s altogether, or figure out a way to 
>>>> wait until the socket is really ready.
>>>> 
>>>> Any thoughts?
>>>> 
>>>> 
>>>>> On Oct 27, 2015, at 4:11 PM, George Bosilca <bosi...@icl.utk.edu 
>>>>> <mailto:bosi...@icl.utk.edu>> wrote:
>>>>> 
>>>>> It appear the branch solve the problem at least partially. I asked one of 
>>>>> my students to hammer it pretty badly, and he reported that the deadlocks 
>>>>> still occur. He also graciously provided some stacktraces:
>>>>> 
>>>>> #0  0x00007f4bd5274aed in nanosleep () from /lib64/libc.so.6
>>>>> #1  0x00007f4bd52a9c94 in usleep () from /lib64/libc.so.6
>>>>> #2  0x00007f4bd2e42b00 in OPAL_PMIX_PMIX1XX_PMIx_Fence (procs=0x0, 
>>>>> nprocs=0, info=0x7fff3c561960, 
>>>>>     ninfo=1) at src/client/pmix_client_fence.c:100
>>>>> #3  0x00007f4bd306e6d2 in pmix1_fence (procs=0x0, collect_data=1) at 
>>>>> pmix1_client.c:306
>>>>> #4  0x00007f4bd57d5cc3 in ompi_mpi_init (argc=3, argv=0x7fff3c561ea8, 
>>>>> requested=3, 
>>>>>     provided=0x7fff3c561d84) at runtime/ompi_mpi_init.c:644
>>>>> #5  0x00007f4bd5813399 in PMPI_Init_thread (argc=0x7fff3c561d7c, 
>>>>> argv=0x7fff3c561d70, required=3, 
>>>>>     provided=0x7fff3c561d84) at pinit_thread.c:69
>>>>> #6  0x0000000000401516 in main (argc=3, argv=0x7fff3c561ea8) at 
>>>>> osu_mbw_mr.c:86
>>>>> 
>>>>> And another process:
>>>>> 
>>>>> #0  0x00007f7b9d7d8bdc in recv () from /lib64/libpthread.so.0
>>>>> #1  0x00007f7b9b0aa42d in opal_pmix_pmix1xx_pmix_usock_recv_blocking 
>>>>> (sd=13, data=0x7ffd62139004 "", 
>>>>>     size=4) at src/usock/usock.c:168
>>>>> #2  0x00007f7b9b0af5d9 in recv_connect_ack (sd=13) at 
>>>>> src/client/pmix_client.c:844
>>>>> #3  0x00007f7b9b0b085e in usock_connect (addr=0x7ffd62139330) at 
>>>>> src/client/pmix_client.c:1110
>>>>> #4  0x00007f7b9b0acc24 in connect_to_server (address=0x7ffd62139330, 
>>>>> cbdata=0x7ffd621390e0)
>>>>>     at src/client/pmix_client.c:181
>>>>> #5  0x00007f7b9b0ad569 in OPAL_PMIX_PMIX1XX_PMIx_Init 
>>>>> (proc=0x7f7b9b4e9b60)
>>>>>     at src/client/pmix_client.c:362
>>>>> #6  0x00007f7b9b2dbd9d in pmix1_client_init () at pmix1_client.c:99
>>>>> #7  0x00007f7b9b4eb95f in pmi_component_query (module=0x7ffd62139490, 
>>>>> priority=0x7ffd6213948c)
>>>>>     at ess_pmi_component.c:90
>>>>> #8  0x00007f7b9ce70ec5 in mca_base_select (type_name=0x7f7b9d20e059 
>>>>> "ess", output_id=-1, 
>>>>>     components_available=0x7f7b9d431eb0, best_module=0x7ffd621394d0, 
>>>>> best_component=0x7ffd621394d8, 
>>>>>     priority_out=0x0) at mca_base_components_select.c:77
>>>>> #9  0x00007f7b9d1a956b in orte_ess_base_select () at 
>>>>> base/ess_base_select.c:40
>>>>> #10 0x00007f7b9d160449 in orte_init (pargc=0x0, pargv=0x0, flags=32) at 
>>>>> runtime/orte_init.c:219
>>>>> #11 0x00007f7b9da4377a in ompi_mpi_init (argc=3, argv=0x7ffd621397f8, 
>>>>> requested=3, 
>>>>>     provided=0x7ffd621396d4) at runtime/ompi_mpi_init.c:488
>>>>> #12 0x00007f7b9da81399 in PMPI_Init_thread (argc=0x7ffd621396cc, 
>>>>> argv=0x7ffd621396c0, required=3, 
>>>>>     provided=0x7ffd621396d4) at pinit_thread.c:69
>>>>> #13 0x0000000000401516 in main (argc=3, argv=0x7ffd621397f8) at 
>>>>> osu_mbw_mr.c:86
>>>>> 
>>>>>   George.
>>>>> 
>>>>> 
>>>>> 
>>>>> On Tue, Oct 27, 2015 at 2:36 PM, Ralph Castain <r...@open-mpi.org 
>>>>> <mailto:r...@open-mpi.org>> wrote:
>>>>> I haven’t been able to replicate this when using the branch in this PR:
>>>>> 
>>>>> https://github.com/open-mpi/ompi/pull/1073 
>>>>> <https://github.com/open-mpi/ompi/pull/1073>
>>>>> 
>>>>> Would you mind giving it a try? It fixes some other race conditions and 
>>>>> might pick this one up too.
>>>>> 
>>>>> 
>>>>>> On Oct 27, 2015, at 10:04 AM, Ralph Castain <r...@open-mpi.org 
>>>>>> <mailto:r...@open-mpi.org>> wrote:
>>>>>> 
>>>>>> Okay, I’ll take a look - I’ve been chasing a race condition that might 
>>>>>> be related
>>>>>> 
>>>>>>> On Oct 27, 2015, at 9:54 AM, George Bosilca <bosi...@icl.utk.edu 
>>>>>>> <mailto:bosi...@icl.utk.edu>> wrote:
>>>>>>> 
>>>>>>> No, it's using 2 nodes.
>>>>>>>   George.
>>>>>>> 
>>>>>>> 
>>>>>>> On Tue, Oct 27, 2015 at 12:35 PM, Ralph Castain <r...@open-mpi.org 
>>>>>>> <mailto:r...@open-mpi.org>> wrote:
>>>>>>> Is this on a single node?
>>>>>>> 
>>>>>>>> On Oct 27, 2015, at 9:25 AM, George Bosilca <bosi...@icl.utk.edu 
>>>>>>>> <mailto:bosi...@icl.utk.edu>> wrote:
>>>>>>>> 
>>>>>>>> I get intermittent deadlocks wit the latest trunk. The smallest 
>>>>>>>> reproducer is a shell for loop around a small (2 processes) short (20 
>>>>>>>> seconds) MPI application. After few tens of iterations the MPI_Init 
>>>>>>>> will deadlock with the following backtrace:
>>>>>>>> 
>>>>>>>> #0  0x00007fa94b5d9aed in nanosleep () from /lib64/libc.so.6
>>>>>>>> #1  0x00007fa94b60ec94 in usleep () from /lib64/libc.so.6
>>>>>>>> #2  0x00007fa94960ba08 in OPAL_PMIX_PMIX1XX_PMIx_Fence (procs=0x0, 
>>>>>>>> nprocs=0, info=0x7ffd7934fb90, 
>>>>>>>>     ninfo=1) at src/client/pmix_client_fence.c:100
>>>>>>>> #3  0x00007fa9498376a2 in pmix1_fence (procs=0x0, collect_data=1) at 
>>>>>>>> pmix1_client.c:305
>>>>>>>> #4  0x00007fa94bb39ba4 in ompi_mpi_init (argc=3, argv=0x7ffd793500a8, 
>>>>>>>> requested=3, 
>>>>>>>>     provided=0x7ffd7934ff94) at runtime/ompi_mpi_init.c:645
>>>>>>>> #5  0x00007fa94bb77281 in PMPI_Init_thread (argc=0x7ffd7934ff8c, 
>>>>>>>> argv=0x7ffd7934ff80, required=3, 
>>>>>>>>     provided=0x7ffd7934ff94) at pinit_thread.c:69
>>>>>>>> #6  0x000000000040150f in main (argc=3, argv=0x7ffd793500a8) at 
>>>>>>>> osu_mbw_mr.c:86
>>>>>>>> 
>>>>>>>> On my machines this is reproducible at 100% after anywhere between 50 
>>>>>>>> and 100 iterations.
>>>>>>>> 
>>>>>>>>   Thanks,
>>>>>>>>     George.
>>>>>>>> 
>>>>>>>> _______________________________________________
>>>>>>>> devel mailing list
>>>>>>>> de...@open-mpi.org <mailto:de...@open-mpi.org>
>>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel 
>>>>>>>> <http://www.open-mpi.org/mailman/listinfo.cgi/devel>
>>>>>>>> Link to this post: 
>>>>>>>> http://www.open-mpi.org/community/lists/devel/2015/10/18280.php 
>>>>>>>> <http://www.open-mpi.org/community/lists/devel/2015/10/18280.php>
>>>>>>> 
>>>>>>> _______________________________________________
>>>>>>> devel mailing list
>>>>>>> de...@open-mpi.org <mailto:de...@open-mpi.org>
>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel 
>>>>>>> <http://www.open-mpi.org/mailman/listinfo.cgi/devel>
>>>>>>> Link to this post: 
>>>>>>> http://www.open-mpi.org/community/lists/devel/2015/10/18281.php 
>>>>>>> <http://www.open-mpi.org/community/lists/devel/2015/10/18281.php>
>>>>>>> 
>>>>>>> _______________________________________________
>>>>>>> devel mailing list
>>>>>>> de...@open-mpi.org <mailto:de...@open-mpi.org>
>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel 
>>>>>>> <http://www.open-mpi.org/mailman/listinfo.cgi/devel>
>>>>>>> Link to this post: 
>>>>>>> http://www.open-mpi.org/community/lists/devel/2015/10/18282.php 
>>>>>>> <http://www.open-mpi.org/community/lists/devel/2015/10/18282.php>
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> devel mailing list
>>>>> de...@open-mpi.org <mailto:de...@open-mpi.org>
>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel 
>>>>> <http://www.open-mpi.org/mailman/listinfo.cgi/devel>
>>>>> Link to this post: 
>>>>> http://www.open-mpi.org/community/lists/devel/2015/10/18284.php 
>>>>> <http://www.open-mpi.org/community/lists/devel/2015/10/18284.php>
>>>>> 
>>>>> _______________________________________________
>>>>> devel mailing list
>>>>> de...@open-mpi.org <mailto:de...@open-mpi.org>
>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel 
>>>>> <http://www.open-mpi.org/mailman/listinfo.cgi/devel>
>>>>> Link to this post: 
>>>>> http://www.open-mpi.org/community/lists/devel/2015/10/18292.php 
>>>>> <http://www.open-mpi.org/community/lists/devel/2015/10/18292.php>
>>>> 
>>>> _______________________________________________
>>>> devel mailing list
>>>> de...@open-mpi.org <mailto:de...@open-mpi.org>
>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel 
>>>> <http://www.open-mpi.org/mailman/listinfo.cgi/devel>
>>>> Link to this post: 
>>>> http://www.open-mpi.org/community/lists/devel/2015/10/18294.php 
>>>> <http://www.open-mpi.org/community/lists/devel/2015/10/18294.php>
>>>> 
>>>> _______________________________________________
>>>> devel mailing list
>>>> de...@open-mpi.org <mailto:de...@open-mpi.org>
>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel 
>>>> <http://www.open-mpi.org/mailman/listinfo.cgi/devel>
>>>> Link to this post: 
>>>> http://www.open-mpi.org/community/lists/devel/2015/10/18302.php 
>>>> <http://www.open-mpi.org/community/lists/devel/2015/10/18302.php>
>> 
> 

Reply via email to