Re: [OMPI devel] PMIX deadlock

Mark Santcroos Mon, 9 Nov 2015 14:27:36 -0500 (EST)

It seems the change suggested by Nysal also allows me to run into the next 
problem ;-)


Mark

> On 09 Nov 2015, at 20:19 , George Bosilca <[email protected]> wrote:
> 
> All 10k tests completed successfully. Nysal pinpointed the real problem 
> behind the deadlocks. :+1:
> 
>   George.
> 
> 
> On Mon, Nov 9, 2015 at 1:13 PM, Ralph Castain <[email protected]> wrote:
> Looking at it, I think I see what was happening. The thread would start, but 
> then immediately see that the active flag was false and would exit. This left 
> the server without any listening thread - but it wouldn’t detect this had 
> happened. It was therefore a race between whether the thread checked the flag 
> before the server set it.
> 
> Thanks Nysal - I believe this should indeed fix the problem!
> 
> 
>> On Nov 9, 2015, at 9:04 AM, George Bosilca <[email protected]> wrote:
>> 
>> Clearly Nyal got a valid point there. I launched a stress test with Nysal 
>> suggestion in the code, and so far it's up to few hundreds iterations
>> without deadlock. I would not claim victory yet, I launched a 10k cycle to 
>> see where we stand (btw this never passed before).
>> I'll let you know the outcome.
>> 
>>   George.
>> 
>> 
>> On Mon, Nov 9, 2015 at 11:55 AM, Artem Polyakov <[email protected]> wrote:
>> 
>> 
>> 2015-11-09 22:42 GMT+06:00 Artem Polyakov <[email protected]>:
>> This is the very good point, Nysal!
>> 
>> This is definitely a problem and I can say even more: avg. 3 from every 10 
>> tasks was affected by this bug. Once the PR 
>> (https://github.com/pmix/master/pull/8) was applied I was able to run 100 
>> testing tasks without any hangs.
>> 
>> Here some more information on my symptoms. I was observing this without 
>> OMPI, just running pmix_client test binary from PMIx test suite with SLURM 
>> PMIx plugin.
>> Periodicaly application was hanging. Investigation shows that not all 
>> processes are able to initialize correctly. 
>> Here is how such client's backtrace looks like:
>> 
>> P.S. I think that this backtrace may be relevant to George's problem as 
>> well. In my case not all of the processes was hanging in the 
>> connect_to_server, most of them were able to move forward and reach Fence.
>> George, the backtrace that you've posted was the same on both processes or 
>> it was the "random" one from one of them?
>>  
>> (gdb) bt
>> #0  0x00007f1448f1b7eb in recv () from /lib/x86_64-linux-gnu/libpthread.so.0
>> #1  0x00007f144914c191 in pmix_usock_recv_blocking (sd=9, 
>> data=0x7fff367f7c64 "", size=4) at src/usock/usock.c:166
>> #2  0x00007f1449152d18 in recv_connect_ack (sd=9) at 
>> src/client/pmix_client.c:837
>> #3  0x00007f14491546bf in usock_connect (addr=0x7fff367f7d60) at 
>> src/client/pmix_client.c:1103
>> #4  0x00007f144914f94c in connect_to_server (address=0x7fff367f7d60, 
>> cbdata=0x7fff367f7dd0) at src/client/pmix_client.c:179
>> #5  0x00007f1449150421 in PMIx_Init (proc=0x7fff367f81d0) at 
>> src/client/pmix_client.c:355
>> #6  0x0000000000401b97 in main (argc=9, argv=0x7fff367f83d8) at 
>> pmix_client.c:62
>> 
>> 
>> The server-side debug has the following lines at the end of the file:
>> [cn33:00482] pmix:server register client slurm.pmix.22.0:10
>> [cn33:00482] pmix:server _register_client for nspace slurm.pmix.22.0 rank 10
>> [cn33:00482] pmix:server setup_fork for nspace slurm.pmix.22.0 rank 10
>> 
>> in normal operation the following lines should appear after lines above:
>> ....
>> [cn33:00188] listen_thread: new connection: (26, 0)
>> [cn33:00188] connection_handler: new connection: 26
>> [cn33:00188] RECV CONNECT ACK FROM PEER ON SOCKET 26
>> [cn33:00188] waiting for blocking recv of 16 bytes
>> [cn33:00188] blocking receive complete from remote
>> ....
>> 
>> At the client side I see the following lines
>> cn33:00491] usock_peer_try_connect: attempting to connect to server
>> [cn33:00491] usock_peer_try_connect: attempting to connect to server on 
>> socket 10
>> [cn33:00491] pmix: SEND CONNECT ACK
>> [cn33:00491] sec: native create_cred
>> [cn33:00491] sec: using credential 1000:1000
>> [cn33:00491] send blocking of 54 bytes to socket 10
>> [cn33:00491] blocking send complete to socket 10
>> [cn33:00491] pmix: RECV CONNECT ACK FROM SERVER
>> [cn33:00491] waiting for blocking recv of 4 bytes
>> [cn33:00491] blocking_recv received error 11:Resource temporarily 
>> unavailable from remote - cycling
>> [cn33:00491] blocking_recv received error 11:Resource temporarily 
>> unavailable from remote - cycling
>> [... repeated many times ...]
>> 
>> With the fix for the problem highlighted by Nysal all runs cleanly.
>> 
>> 
>> 2015-11-09 10:53 GMT+06:00 Nysal Jan K A <[email protected]>:
>> In listen_thread():
>> 194     while (pmix_server_globals.listen_thread_active) {
>> 195         FD_ZERO(&readfds);
>> 196         FD_SET(pmix_server_globals.listen_socket, &readfds);
>> 197         max = pmix_server_globals.listen_socket;
>> 
>> Is it possible that pmix_server_globals.listen_thread_active can be false, 
>> in which case the thread just exits and will never call accept() ?
>> 
>> In pmix_start_listening():
>> 147         /* fork off the listener thread */
>> 148         if (0 > pthread_create(&engine, NULL, listen_thread, NULL)) {
>> 149             return PMIX_ERROR;
>> 150         }
>> 151         pmix_server_globals.listen_thread_active = true;
>> 
>> pmix_server_globals.listen_thread_active is set to true after the thread is 
>> created, could this cause a race ?
>> listen_thread_active might also need to be declared as volatile.
>> 
>> Regards
>> --Nysal
>> 
>> On Sun, Nov 8, 2015 at 10:38 PM, George Bosilca <[email protected]> wrote:
>> We had a power outage last week and the local disks on our cluster were 
>> wiped out. My tester was in there. But, I can rewrite it after SC.
>> 
>>   George.
>> 
>> On Sat, Nov 7, 2015 at 12:04 PM, Ralph Castain <[email protected]> wrote:
>> Could you send me your stress test? I’m wondering if it is just something 
>> about how we set socket options
>> 
>> 
>>> On Nov 7, 2015, at 8:58 AM, George Bosilca <[email protected]> wrote:
>>> 
>>> I has to postpone this until after SC. However, I ran for 3 days a stress 
>>> test of UDS reproducing the opening and sending of data (what Ralph said in 
>>> his email) and I never could get a deadlock.
>>> 
>>>   George.
>>> 
>>> 
>>> On Sat, Nov 7, 2015 at 11:26 AM, Ralph Castain <[email protected]> wrote:
>>> George was looking into it, but I don’t know if he has had time recently to 
>>> continue the investigation. We understand “what” is happening (accept 
>>> sometimes ignores the connection), but we don’t yet know “why”. I’ve done 
>>> some digging around the web, and found that sometimes you can try to talk 
>>> to a Unix Domain Socket too quickly - i.e., you open it and then send to 
>>> it, but the OS hasn’t yet set it up. In those cases, you can hang the 
>>> socket. However, I’ve tried adding some artificial delay, and while it 
>>> helped, it didn’t completely solve the problem.
>>> 
>>> I have an idea for a workaround (set a timer and retry after awhile), but 
>>> would obviously prefer a real solution. I’m not even sure it will work as 
>>> it is unclear that the server (who is the one hung in accept) will break 
>>> free if the client closes the socket and retries.
>>> 
>>> 
>>>> On Nov 6, 2015, at 10:53 PM, Artem Polyakov <[email protected]> wrote:
>>>> 
>>>> Hello, is there any progress on this topic? This affects our PMIx 
>>>> measurements.
>>>> 
>>>> 2015-10-30 21:21 GMT+06:00 Ralph Castain <[email protected]>:
>>>> I’ve verified that the orte/util/listener thread is not being started, so 
>>>> I don’t think it should be involved in this problem.
>>>> 
>>>> HTH
>>>> Ralph
>>>> 
>>>>> On Oct 30, 2015, at 8:07 AM, Ralph Castain <[email protected]> wrote:
>>>>> 
>>>>> Hmmm…there is a hook that would allow the PMIx server to utilize that 
>>>>> listener thread, but we aren’t currently using it. Each daemon plus 
>>>>> mpirun will call orte_start_listener, but nothing is currently 
>>>>> registering and so the listener in that code is supposed to just return 
>>>>> without starting the thread.
>>>>> 
>>>>> So the only listener thread that should exist is the one inside the PMIx 
>>>>> server itself. If something else is happening, then that would be a bug. 
>>>>> I can look at the orte listener code to ensure that the thread isn’t 
>>>>> incorrectly starting.
>>>>> 
>>>>> 
>>>>>> On Oct 29, 2015, at 10:03 PM, George Bosilca <[email protected]> wrote:
>>>>>> 
>>>>>> Some progress, that puzzles me but might help you understand. Once the 
>>>>>> deadlock appears, if I manually kill the MPI process on the node where 
>>>>>> the deadlock was created, the local orte daemon doesn't notice and will 
>>>>>> just keep waiting.
>>>>>> 
>>>>>> Quick question: I am under the impression that the issue is not in the 
>>>>>> PMIX server but somewhere around the listener_thread_fn in 
>>>>>> orte/util/listener.c. Possible ?
>>>>>> 
>>>>>>   George.
>>>>>> 
>>>>>> 
>>>>>> On Wed, Oct 28, 2015 at 3:56 AM, Ralph Castain <[email protected]> wrote:
>>>>>> Should have also clarified: the prior fixes are indeed in the current 
>>>>>> master.
>>>>>> 
>>>>>>> On Oct 28, 2015, at 12:42 AM, Ralph Castain <[email protected]> wrote:
>>>>>>> 
>>>>>>> Nope - I was wrong. The correction on the client side consisted of 
>>>>>>> attempting to timeout if the blocking recv failed. We then modified the 
>>>>>>> blocking send/recv so they would handle errors.
>>>>>>> 
>>>>>>> So that problem occurred -after- the server had correctly called 
>>>>>>> accept. The listener code is in 
>>>>>>> opal/mca/pmix/pmix1xx/pmix/src/server/pmix_server_listener.c
>>>>>>> 
>>>>>>> It looks to me like the only way we could drop the accept (assuming the 
>>>>>>> OS doesn’t lose it) is if the file descriptor lies outside the expected 
>>>>>>> range once we fall out of select:
>>>>>>> 
>>>>>>> 
>>>>>>>         /* Spin accepting connections until all active listen sockets
>>>>>>>          * do not have any incoming connections, pushing each connection
>>>>>>>          * onto the event queue for processing
>>>>>>>          */
>>>>>>>         do {
>>>>>>>             accepted_connections = 0;
>>>>>>>             /* according to the man pages, select replaces the given 
>>>>>>> descriptor
>>>>>>>              * set with a subset consisting of those descriptors that 
>>>>>>> are ready
>>>>>>>              * for the specified operation - in this case, a read. So 
>>>>>>> we need to
>>>>>>>              * first check to see if this file descriptor is included 
>>>>>>> in the
>>>>>>>              * returned subset
>>>>>>>              */
>>>>>>>             if (0 == FD_ISSET(pmix_server_globals.listen_socket, 
>>>>>>> &readfds)) {
>>>>>>>                 /* this descriptor is not included */
>>>>>>>                 continue;
>>>>>>>             }
>>>>>>> 
>>>>>>>             /* this descriptor is ready to be read, which means a 
>>>>>>> connection
>>>>>>>              * request has been received - so harvest it. All we want 
>>>>>>> to do
>>>>>>>              * here is accept the connection and push the info onto the 
>>>>>>> event
>>>>>>>              * library for subsequent processing - we don't want to 
>>>>>>> actually
>>>>>>>              * process the connection here as it takes too long, and so 
>>>>>>> the
>>>>>>>              * OS might start rejecting connections due to timeout.
>>>>>>>              */
>>>>>>>             pending_connection = PMIX_NEW(pmix_pending_connection_t);
>>>>>>>             event_assign(&pending_connection->ev, pmix_globals.evbase, 
>>>>>>> -1,
>>>>>>>                          EV_WRITE, connection_handler, 
>>>>>>> pending_connection);
>>>>>>>             pending_connection->sd = 
>>>>>>> accept(pmix_server_globals.listen_socket,
>>>>>>>                                             (struct 
>>>>>>> sockaddr*)&(pending_connection->addr),
>>>>>>>                                             &addrlen);
>>>>>>>             if (pending_connection->sd < 0) {
>>>>>>>                 PMIX_RELEASE(pending_connection);
>>>>>>>                 if (pmix_socket_errno != EAGAIN ||
>>>>>>>                     pmix_socket_errno != EWOULDBLOCK) {
>>>>>>>                     if (EMFILE == pmix_socket_errno) {
>>>>>>>                         PMIX_ERROR_LOG(PMIX_ERR_OUT_OF_RESOURCE);
>>>>>>>                     } else {
>>>>>>>                         pmix_output(0, "listen_thread: accept() failed: 
>>>>>>> %s (%d).",
>>>>>>>                                     strerror(pmix_socket_errno), 
>>>>>>> pmix_socket_errno);
>>>>>>>                     }
>>>>>>>                     goto done;
>>>>>>>                 }
>>>>>>>                 continue;
>>>>>>>             }
>>>>>>> 
>>>>>>>             pmix_output_verbose(8, pmix_globals.debug_output,
>>>>>>>                                 "listen_thread: new connection: (%d, 
>>>>>>> %d)",
>>>>>>>                                 pending_connection->sd, 
>>>>>>> pmix_socket_errno);
>>>>>>>             /* activate the event */
>>>>>>>             event_active(&pending_connection->ev, EV_WRITE, 1);
>>>>>>>             accepted_connections++;
>>>>>>>         } while (accepted_connections > 0);
>>>>>>> 
>>>>>>> 
>>>>>>>> On Oct 28, 2015, at 12:25 AM, Ralph Castain <[email protected]> wrote:
>>>>>>>> 
>>>>>>>> Looking at the code, it appears that a fix was committed for this 
>>>>>>>> problem, and that we correctly resolved the issue found by Paul. The 
>>>>>>>> problem is that the fix didn’t get upstreamed, and so it was lost the 
>>>>>>>> next time we refreshed PMIx. Sigh.
>>>>>>>> 
>>>>>>>> Let me try to recreate the fix and have you take a gander at it.
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> On Oct 28, 2015, at 12:22 AM, Ralph Castain <[email protected]> wrote:
>>>>>>>>> 
>>>>>>>>> Here is the discussion - afraid it is fairly lengthy. Ignore the 
>>>>>>>>> hwloc references in it as that was a separate issue:
>>>>>>>>> 
>>>>>>>>> http://www.open-mpi.org/community/lists/devel/2015/09/18074.php
>>>>>>>>> 
>>>>>>>>> It definitely sounds like the same issue creeping in again. I’d 
>>>>>>>>> appreciate any thoughts on how to correct it. If it helps, you could 
>>>>>>>>> look at the PMIx master - there are standalone tests in the 
>>>>>>>>> test/simple directory that fork/exec a child and just do the 
>>>>>>>>> connection.
>>>>>>>>> 
>>>>>>>>> https://github.com/pmix/master
>>>>>>>>> 
>>>>>>>>> The test server is simptest.c - it will spawn a single copy of 
>>>>>>>>> simpclient.c by default.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> On Oct 27, 2015, at 10:14 PM, George Bosilca <[email protected]> 
>>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>> Interesting. Do you have a pointer to the commit (or/and to the 
>>>>>>>>>> discussion)?
>>>>>>>>>> 
>>>>>>>>>> I looked at the PMIX code, and I have identified few issues, but 
>>>>>>>>>> unfortunately none of them seem to fix the problem for good. 
>>>>>>>>>> However, now I need more than 1000 runs to get a deadlock (instead 
>>>>>>>>>> of few tens).
>>>>>>>>>> 
>>>>>>>>>> Looking with "netstat -ax" at the status of the UDS while the 
>>>>>>>>>> processes are deadlocked, I see 2 UDS with the same name: one from 
>>>>>>>>>> the server which is in LISTEN state, and one for the client which is 
>>>>>>>>>> being in CONNECTING state (while the client already sent a message 
>>>>>>>>>> in the socket and is now waiting in a blocking receive). This 
>>>>>>>>>> somehow suggest that the server has not yet called accept on the 
>>>>>>>>>> UDS. Unfortunately, there are 3 threads all doing different flavors 
>>>>>>>>>> of even_base and select, so I have a hard time tracking the path of 
>>>>>>>>>> the UDS on the server side.
>>>>>>>>>> 
>>>>>>>>>> So in order to validate my assumption I wrote a minimalistic UDS 
>>>>>>>>>> client and server application and tried different scenarios. The 
>>>>>>>>>> conclusion is that in order to see the same type of output from 
>>>>>>>>>> "netstat -ax" I have to call listen on the server, connect on the 
>>>>>>>>>> client and do not call accept on the server.
>>>>>>>>>> 
>>>>>>>>>> With the same occasion I also confirmed that the UDS are holding the 
>>>>>>>>>> data sent so there is no need for further synchronization for the 
>>>>>>>>>> case where the data is sent first. We only need to find out how the 
>>>>>>>>>> server forgets to call accept.
>>>>>>>>>> 
>>>>>>>>>>   George.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> On Tue, Oct 27, 2015 at 7:52 PM, Ralph Castain <[email protected]> 
>>>>>>>>>> wrote:
>>>>>>>>>> Hmmm…this looks like it might be that problem we previously saw 
>>>>>>>>>> where the blocking recv hangs in a proc when the blocking send tries 
>>>>>>>>>> to send before the domain socket is actually ready, and so the send 
>>>>>>>>>> fails on the other end. As I recall, it was something to do with the 
>>>>>>>>>> socketoptions - and then Paul had a problem on some of his machines, 
>>>>>>>>>> and we backed it out?
>>>>>>>>>> 
>>>>>>>>>> I wonder if that’s what is biting us here again, and what we need is 
>>>>>>>>>> to either remove the blocking send/recv’s altogether, or figure out 
>>>>>>>>>> a way to wait until the socket is really ready.
>>>>>>>>>> 
>>>>>>>>>> Any thoughts?
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> On Oct 27, 2015, at 4:11 PM, George Bosilca <[email protected]> 
>>>>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> It appear the branch solve the problem at least partially. I asked 
>>>>>>>>>>> one of my students to hammer it pretty badly, and he reported that 
>>>>>>>>>>> the deadlocks still occur. He also graciously provided some 
>>>>>>>>>>> stacktraces:
>>>>>>>>>>> 
>>>>>>>>>>> #0  0x00007f4bd5274aed in nanosleep () from /lib64/libc.so.6
>>>>>>>>>>> #1  0x00007f4bd52a9c94 in usleep () from /lib64/libc.so.6
>>>>>>>>>>> #2  0x00007f4bd2e42b00 in OPAL_PMIX_PMIX1XX_PMIx_Fence (procs=0x0, 
>>>>>>>>>>> nprocs=0, info=0x7fff3c561960, 
>>>>>>>>>>>     ninfo=1) at src/client/pmix_client_fence.c:100
>>>>>>>>>>> #3  0x00007f4bd306e6d2 in pmix1_fence (procs=0x0, collect_data=1) 
>>>>>>>>>>> at pmix1_client.c:306
>>>>>>>>>>> #4  0x00007f4bd57d5cc3 in ompi_mpi_init (argc=3, 
>>>>>>>>>>> argv=0x7fff3c561ea8, requested=3, 
>>>>>>>>>>>     provided=0x7fff3c561d84) at runtime/ompi_mpi_init.c:644
>>>>>>>>>>> #5  0x00007f4bd5813399 in PMPI_Init_thread (argc=0x7fff3c561d7c, 
>>>>>>>>>>> argv=0x7fff3c561d70, required=3, 
>>>>>>>>>>>     provided=0x7fff3c561d84) at pinit_thread.c:69
>>>>>>>>>>> #6  0x0000000000401516 in main (argc=3, argv=0x7fff3c561ea8) at 
>>>>>>>>>>> osu_mbw_mr.c:86
>>>>>>>>>>> 
>>>>>>>>>>> And another process:
>>>>>>>>>>> 
>>>>>>>>>>> #0  0x00007f7b9d7d8bdc in recv () from /lib64/libpthread.so.0
>>>>>>>>>>> #1  0x00007f7b9b0aa42d in 
>>>>>>>>>>> opal_pmix_pmix1xx_pmix_usock_recv_blocking (sd=13, 
>>>>>>>>>>> data=0x7ffd62139004 "", 
>>>>>>>>>>>     size=4) at src/usock/usock.c:168
>>>>>>>>>>> #2  0x00007f7b9b0af5d9 in recv_connect_ack (sd=13) at 
>>>>>>>>>>> src/client/pmix_client.c:844
>>>>>>>>>>> #3  0x00007f7b9b0b085e in usock_connect (addr=0x7ffd62139330) at 
>>>>>>>>>>> src/client/pmix_client.c:1110
>>>>>>>>>>> #4  0x00007f7b9b0acc24 in connect_to_server 
>>>>>>>>>>> (address=0x7ffd62139330, cbdata=0x7ffd621390e0)
>>>>>>>>>>>     at src/client/pmix_client.c:181
>>>>>>>>>>> #5  0x00007f7b9b0ad569 in OPAL_PMIX_PMIX1XX_PMIx_Init 
>>>>>>>>>>> (proc=0x7f7b9b4e9b60)
>>>>>>>>>>>     at src/client/pmix_client.c:362
>>>>>>>>>>> #6  0x00007f7b9b2dbd9d in pmix1_client_init () at pmix1_client.c:99
>>>>>>>>>>> #7  0x00007f7b9b4eb95f in pmi_component_query 
>>>>>>>>>>> (module=0x7ffd62139490, priority=0x7ffd6213948c)
>>>>>>>>>>>     at ess_pmi_component.c:90
>>>>>>>>>>> #8  0x00007f7b9ce70ec5 in mca_base_select (type_name=0x7f7b9d20e059 
>>>>>>>>>>> "ess", output_id=-1, 
>>>>>>>>>>>     components_available=0x7f7b9d431eb0, 
>>>>>>>>>>> best_module=0x7ffd621394d0, best_component=0x7ffd621394d8, 
>>>>>>>>>>>     priority_out=0x0) at mca_base_components_select.c:77
>>>>>>>>>>> #9  0x00007f7b9d1a956b in orte_ess_base_select () at 
>>>>>>>>>>> base/ess_base_select.c:40
>>>>>>>>>>> #10 0x00007f7b9d160449 in orte_init (pargc=0x0, pargv=0x0, 
>>>>>>>>>>> flags=32) at runtime/orte_init.c:219
>>>>>>>>>>> #11 0x00007f7b9da4377a in ompi_mpi_init (argc=3, 
>>>>>>>>>>> argv=0x7ffd621397f8, requested=3, 
>>>>>>>>>>>     provided=0x7ffd621396d4) at runtime/ompi_mpi_init.c:488
>>>>>>>>>>> #12 0x00007f7b9da81399 in PMPI_Init_thread (argc=0x7ffd621396cc, 
>>>>>>>>>>> argv=0x7ffd621396c0, required=3, 
>>>>>>>>>>>     provided=0x7ffd621396d4) at pinit_thread.c:69
>>>>>>>>>>> #13 0x0000000000401516 in main (argc=3, argv=0x7ffd621397f8) at 
>>>>>>>>>>> osu_mbw_mr.c:86
>>>>>>>>>>> 
>>>>>>>>>>>   George.
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> On Tue, Oct 27, 2015 at 2:36 PM, Ralph Castain <[email protected]> 
>>>>>>>>>>> wrote:
>>>>>>>>>>> I haven’t been able to replicate this when using the branch in this 
>>>>>>>>>>> PR:
>>>>>>>>>>> 
>>>>>>>>>>> https://github.com/open-mpi/ompi/pull/1073
>>>>>>>>>>> 
>>>>>>>>>>> Would you mind giving it a try? It fixes some other race conditions 
>>>>>>>>>>> and might pick this one up too.
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>>> On Oct 27, 2015, at 10:04 AM, Ralph Castain <[email protected]> 
>>>>>>>>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>> Okay, I’ll take a look - I’ve been chasing a race condition that 
>>>>>>>>>>>> might be related
>>>>>>>>>>>> 
>>>>>>>>>>>>> On Oct 27, 2015, at 9:54 AM, George Bosilca <[email protected]> 
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> No, it's using 2 nodes.
>>>>>>>>>>>>>   George.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Tue, Oct 27, 2015 at 12:35 PM, Ralph Castain 
>>>>>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>>>>> Is this on a single node?
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Oct 27, 2015, at 9:25 AM, George Bosilca 
>>>>>>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I get intermittent deadlocks wit the latest trunk. The smallest 
>>>>>>>>>>>>>> reproducer is a shell for loop around a small (2 processes) 
>>>>>>>>>>>>>> short (20 seconds) MPI application. After few tens of iterations 
>>>>>>>>>>>>>> the MPI_Init will deadlock with the following backtrace:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> #0  0x00007fa94b5d9aed in nanosleep () from /lib64/libc.so.6
>>>>>>>>>>>>>> #1  0x00007fa94b60ec94 in usleep () from /lib64/libc.so.6
>>>>>>>>>>>>>> #2  0x00007fa94960ba08 in OPAL_PMIX_PMIX1XX_PMIx_Fence 
>>>>>>>>>>>>>> (procs=0x0, nprocs=0, info=0x7ffd7934fb90, 
>>>>>>>>>>>>>>     ninfo=1) at src/client/pmix_client_fence.c:100
>>>>>>>>>>>>>> #3  0x00007fa9498376a2 in pmix1_fence (procs=0x0, 
>>>>>>>>>>>>>> collect_data=1) at pmix1_client.c:305
>>>>>>>>>>>>>> #4  0x00007fa94bb39ba4 in ompi_mpi_init (argc=3, 
>>>>>>>>>>>>>> argv=0x7ffd793500a8, requested=3, 
>>>>>>>>>>>>>>     provided=0x7ffd7934ff94) at runtime/ompi_mpi_init.c:645
>>>>>>>>>>>>>> #5  0x00007fa94bb77281 in PMPI_Init_thread (argc=0x7ffd7934ff8c, 
>>>>>>>>>>>>>> argv=0x7ffd7934ff80, required=3, 
>>>>>>>>>>>>>>     provided=0x7ffd7934ff94) at pinit_thread.c:69
>>>>>>>>>>>>>> #6  0x000000000040150f in main (argc=3, argv=0x7ffd793500a8) at 
>>>>>>>>>>>>>> osu_mbw_mr.c:86
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On my machines this is reproducible at 100% after anywhere 
>>>>>>>>>>>>>> between 50 and 100 iterations.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>   Thanks,
>>>>>>>>>>>>>>     George.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>> devel mailing list
>>>>>>>>>>>>>> [email protected]
>>>>>>>>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>>>>>>>> Link to this post: 
>>>>>>>>>>>>>> http://www.open-mpi.org/community/lists/devel/2015/10/18280.php
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>> devel mailing list
>>>>>>>>>>>>> [email protected]
>>>>>>>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>>>>>>> Link to this post: 
>>>>>>>>>>>>> http://www.open-mpi.org/community/lists/devel/2015/10/18281.php
>>>>>>>>>>>>> 
>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>> devel mailing list
>>>>>>>>>>>>> [email protected]
>>>>>>>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>>>>>>> Link to this post: 
>>>>>>>>>>>>> http://www.open-mpi.org/community/lists/devel/2015/10/18282.php
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> devel mailing list
>>>>>>>>>>> [email protected]
>>>>>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>>>>> Link to this post: 
>>>>>>>>>>> http://www.open-mpi.org/community/lists/devel/2015/10/18284.php
>>>>>>>>>>> 
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> devel mailing list
>>>>>>>>>>> [email protected]
>>>>>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>>>>> Link to this post: 
>>>>>>>>>>> http://www.open-mpi.org/community/lists/devel/2015/10/18292.php
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> _______________________________________________
>>>>>>>>>> devel mailing list
>>>>>>>>>> [email protected]
>>>>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>>>> Link to this post: 
>>>>>>>>>> http://www.open-mpi.org/community/lists/devel/2015/10/18294.php
>>>>>>>>>> 
>>>>>>>>>> _______________________________________________
>>>>>>>>>> devel mailing list
>>>>>>>>>> [email protected]
>>>>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>>>> Link to this post: 
>>>>>>>>>> http://www.open-mpi.org/community/lists/devel/2015/10/18302.php
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> _______________________________________________
>>>>>> devel mailing list
>>>>>> [email protected]
>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>> Link to this post: 
>>>>>> http://www.open-mpi.org/community/lists/devel/2015/10/18309.php
>>>>>> 
>>>>>> _______________________________________________
>>>>>> devel mailing list
>>>>>> [email protected]
>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>> Link to this post: 
>>>>>> http://www.open-mpi.org/community/lists/devel/2015/10/18320.php
>>>>> 
>>>> 
>>>> 
>>>> _______________________________________________
>>>> devel mailing list
>>>> [email protected]
>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>> Link to this post: 
>>>> http://www.open-mpi.org/community/lists/devel/2015/10/18323.php
>>>> 
>>>> 
>>>> 
>>>> -- 
>>>> С Уважением, Поляков Артем Юрьевич
>>>> Best regards, Artem Y. Polyakov
>>>> _______________________________________________
>>>> devel mailing list
>>>> [email protected]
>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>> Link to this post: 
>>>> http://www.open-mpi.org/community/lists/devel/2015/11/18334.php
>>> 
>>> 
>>> _______________________________________________
>>> devel mailing list
>>> [email protected]
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> Link to this post: 
>>> http://www.open-mpi.org/community/lists/devel/2015/11/18335.php
>>> 
>>> _______________________________________________
>>> devel mailing list
>>> [email protected]
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> Link to this post: 
>>> http://www.open-mpi.org/community/lists/devel/2015/11/18336.php
>> 
>> 
>> _______________________________________________
>> devel mailing list
>> [email protected]
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2015/11/18337.php
>> 
>> 
>> _______________________________________________
>> devel mailing list
>> [email protected]
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2015/11/18340.php
>> 
>> 
>> _______________________________________________
>> devel mailing list
>> [email protected]
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2015/11/18341.php
>> 
>> 
>> 
>> -- 
>> С Уважением, Поляков Артем Юрьевич
>> Best regards, Artem Y. Polyakov
>> 
>> 
>> 
>> -- 
>> С Уважением, Поляков Артем Юрьевич
>> Best regards, Artem Y. Polyakov
>> 
>> _______________________________________________
>> devel mailing list
>> [email protected]
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2015/11/18345.php
>> 
>> _______________________________________________
>> devel mailing list
>> [email protected]
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2015/11/18346.php
> 
> 
> _______________________________________________
> devel mailing list
> [email protected]
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2015/11/18347.php
> 
> _______________________________________________
> devel mailing list
> [email protected]
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2015/11/18348.php

Re: [OMPI devel] PMIX deadlock

Reply via email to