Re: [OMPI devel] PMIX deadlock

George Bosilca Mon, 9 Nov 2015 14:19:12 -0500 (EST)

All 10k tests completed successfully. Nysal pinpointed the real problem
behind the deadlocks. :+1:


  George.


On Mon, Nov 9, 2015 at 1:13 PM, Ralph Castain <[email protected]> wrote:

> Looking at it, I think I see what was happening. The thread would start,
> but then immediately see that the active flag was false and would exit.
> This left the server without any listening thread - but it wouldn’t detect
> this had happened. It was therefore a race between whether the thread
> checked the flag before the server set it.
>
> Thanks Nysal - I believe this should indeed fix the problem!
>
>
> On Nov 9, 2015, at 9:04 AM, George Bosilca <[email protected]> wrote:
>
> Clearly Nyal got a valid point there. I launched a stress test with Nysal
> suggestion in the code, and so far it's up to few hundreds iterations
> without deadlock. I would not claim victory yet, I launched a 10k cycle to
> see where we stand (btw this never passed before).
> I'll let you know the outcome.
>
>   George.
>
>
> On Mon, Nov 9, 2015 at 11:55 AM, Artem Polyakov <[email protected]>
> wrote:
>
>>
>>
>> 2015-11-09 22:42 GMT+06:00 Artem Polyakov <[email protected]>:
>>
>>> This is the very good point, Nysal!
>>>
>>> This is definitely a problem and I can say even more: avg. 3 from every
>>> 10 tasks was affected by this bug. Once the PR (
>>> https://github.com/pmix/master/pull/8) was applied I was able to run
>>> 100 testing tasks without any hangs.
>>>
>>> Here some more information on my symptoms. I was observing this without
>>> OMPI, just running pmix_client test binary from PMIx test suite with SLURM
>>> PMIx plugin.
>>> Periodicaly application was hanging. Investigation shows that not all
>>> processes are able to initialize correctly.
>>> Here is how such client's backtrace looks like:
>>>
>>
>> P.S. I think that this backtrace may be relevant to George's problem as
>> well. In my case not all of the processes was hanging in the
>> connect_to_server, most of them were able to move forward and reach Fence.
>> George, the backtrace that you've posted was the same on both processes
>> or it was the "random" one from one of them?
>>
>>
>>> (gdb) bt
>>> #0  0x00007f1448f1b7eb in recv () from
>>> /lib/x86_64-linux-gnu/libpthread.so.0
>>> #1  0x00007f144914c191 in pmix_usock_recv_blocking (sd=9,
>>> data=0x7fff367f7c64 "", size=4) at src/usock/usock.c:166
>>> #2  0x00007f1449152d18 in recv_connect_ack (sd=9) at
>>> src/client/pmix_client.c:837
>>> #3  0x00007f14491546bf in usock_connect (addr=0x7fff367f7d60) at
>>> src/client/pmix_client.c:1103
>>> #4  0x00007f144914f94c in connect_to_server (address=0x7fff367f7d60,
>>> cbdata=0x7fff367f7dd0) at src/client/pmix_client.c:179
>>> #5  0x00007f1449150421 in PMIx_Init (proc=0x7fff367f81d0) at
>>> src/client/pmix_client.c:355
>>> #6  0x0000000000401b97 in main (argc=9, argv=0x7fff367f83d8) at
>>> pmix_client.c:62
>>>
>>>
>>> The server-side debug has the following lines at the end of the file:
>>> [cn33:00482] pmix:server register client slurm.pmix.22.0:10
>>> [cn33:00482] pmix:server _register_client for nspace slurm.pmix.22.0
>>> rank 10
>>> [cn33:00482] pmix:server setup_fork for nspace slurm.pmix.22.0 rank 10
>>>
>>> in normal operation the following lines should appear after lines above:
>>> ....
>>> [cn33:00188] listen_thread: new connection: (26, 0)
>>> [cn33:00188] connection_handler: new connection: 26
>>> [cn33:00188] RECV CONNECT ACK FROM PEER ON SOCKET 26
>>> [cn33:00188] waiting for blocking recv of 16 bytes
>>> [cn33:00188] blocking receive complete from remote
>>> ....
>>>
>>> At the client side I see the following lines
>>> cn33:00491] usock_peer_try_connect: attempting to connect to server
>>> [cn33:00491] usock_peer_try_connect: attempting to connect to server on
>>> socket 10
>>> [cn33:00491] pmix: SEND CONNECT ACK
>>> [cn33:00491] sec: native create_cred
>>> [cn33:00491] sec: using credential 1000:1000
>>> [cn33:00491] send blocking of 54 bytes to socket 10
>>> [cn33:00491] blocking send complete to socket 10
>>> [cn33:00491] pmix: RECV CONNECT ACK FROM SERVER
>>> [cn33:00491] waiting for blocking recv of 4 bytes
>>> [cn33:00491] blocking_recv received error 11:Resource temporarily
>>> unavailable from remote - cycling
>>> [cn33:00491] blocking_recv received error 11:Resource temporarily
>>> unavailable from remote - cycling
>>> [... repeated many times ...]
>>>
>>> With the fix for the problem highlighted by Nysal all runs cleanly.
>>>
>>>
>>> 2015-11-09 10:53 GMT+06:00 Nysal Jan K A <[email protected]>:
>>>
>>>> In listen_thread():
>>>> 194     while (pmix_server_globals.listen_thread_active) {
>>>> 195         FD_ZERO(&readfds);
>>>> 196         FD_SET(pmix_server_globals.listen_socket, &readfds);
>>>> 197         max = pmix_server_globals.listen_socket;
>>>>
>>>> Is it possible that pmix_server_globals.listen_thread_active can be
>>>> false, in which case the thread just exits and will never call accept() ?
>>>>
>>>> In pmix_start_listening():
>>>> 147         /* fork off the listener thread */
>>>> 148         if (0 > pthread_create(&engine, NULL, listen_thread, NULL))
>>>> {
>>>> 149             return PMIX_ERROR;
>>>> 150         }
>>>> 151         pmix_server_globals.listen_thread_active = true;
>>>>
>>>> pmix_server_globals.listen_thread_active is set to true after the
>>>> thread is created, could this cause a race ?
>>>> listen_thread_active might also need to be declared as volatile.
>>>>
>>>> Regards
>>>> --Nysal
>>>>
>>>> On Sun, Nov 8, 2015 at 10:38 PM, George Bosilca <[email protected]>
>>>> wrote:
>>>>
>>>>> We had a power outage last week and the local disks on our cluster
>>>>> were wiped out. My tester was in there. But, I can rewrite it after SC.
>>>>>
>>>>>   George.
>>>>>
>>>>> On Sat, Nov 7, 2015 at 12:04 PM, Ralph Castain <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Could you send me your stress test? I’m wondering if it is just
>>>>>> something about how we set socket options
>>>>>>
>>>>>>
>>>>>> On Nov 7, 2015, at 8:58 AM, George Bosilca <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>> I has to postpone this until after SC. However, I ran for 3 days a
>>>>>> stress test of UDS reproducing the opening and sending of data (what 
>>>>>> Ralph
>>>>>> said in his email) and I never could get a deadlock.
>>>>>>
>>>>>>   George.
>>>>>>
>>>>>>
>>>>>> On Sat, Nov 7, 2015 at 11:26 AM, Ralph Castain <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> George was looking into it, but I don’t know if he has had time
>>>>>>> recently to continue the investigation. We understand “what” is 
>>>>>>> happening
>>>>>>> (accept sometimes ignores the connection), but we don’t yet know “why”.
>>>>>>> I’ve done some digging around the web, and found that sometimes you can 
>>>>>>> try
>>>>>>> to talk to a Unix Domain Socket too quickly - i.e., you open it and then
>>>>>>> send to it, but the OS hasn’t yet set it up. In those cases, you can 
>>>>>>> hang
>>>>>>> the socket. However, I’ve tried adding some artificial delay, and while 
>>>>>>> it
>>>>>>> helped, it didn’t completely solve the problem.
>>>>>>>
>>>>>>> I have an idea for a workaround (set a timer and retry after
>>>>>>> awhile), but would obviously prefer a real solution. I’m not even sure 
>>>>>>> it
>>>>>>> will work as it is unclear that the server (who is the one hung in 
>>>>>>> accept)
>>>>>>> will break free if the client closes the socket and retries.
>>>>>>>
>>>>>>>
>>>>>>> On Nov 6, 2015, at 10:53 PM, Artem Polyakov <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>> Hello, is there any progress on this topic? This affects our PMIx
>>>>>>> measurements.
>>>>>>>
>>>>>>> 2015-10-30 21:21 GMT+06:00 Ralph Castain <[email protected]>:
>>>>>>>
>>>>>>>> I’ve verified that the orte/util/listener thread is not being
>>>>>>>> started, so I don’t think it should be involved in this problem.
>>>>>>>>
>>>>>>>> HTH
>>>>>>>> Ralph
>>>>>>>>
>>>>>>>> On Oct 30, 2015, at 8:07 AM, Ralph Castain <[email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>> Hmmm…there is a hook that would allow the PMIx server to utilize
>>>>>>>> that listener thread, but we aren’t currently using it. Each daemon 
>>>>>>>> plus
>>>>>>>> mpirun will call orte_start_listener, but nothing is currently 
>>>>>>>> registering
>>>>>>>> and so the listener in that code is supposed to just return without
>>>>>>>> starting the thread.
>>>>>>>>
>>>>>>>> So the only listener thread that should exist is the one inside the
>>>>>>>> PMIx server itself. If something else is happening, then that would be 
>>>>>>>> a
>>>>>>>> bug. I can look at the orte listener code to ensure that the thread 
>>>>>>>> isn’t
>>>>>>>> incorrectly starting.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Oct 29, 2015, at 10:03 PM, George Bosilca <[email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>> Some progress, that puzzles me but might help you understand. Once
>>>>>>>> the deadlock appears, if I manually kill the MPI process on the node 
>>>>>>>> where
>>>>>>>> the deadlock was created, the local orte daemon doesn't notice and will
>>>>>>>> just keep waiting.
>>>>>>>>
>>>>>>>> Quick question: I am under the impression that the issue is not in
>>>>>>>> the PMIX server but somewhere around the listener_thread_fn in
>>>>>>>> orte/util/listener.c. Possible ?
>>>>>>>>
>>>>>>>>   George.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Oct 28, 2015 at 3:56 AM, Ralph Castain <[email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Should have also clarified: the prior fixes are indeed in the
>>>>>>>>> current master.
>>>>>>>>>
>>>>>>>>> On Oct 28, 2015, at 12:42 AM, Ralph Castain <[email protected]>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> Nope - I was wrong. The correction on the client side consisted of
>>>>>>>>> attempting to timeout if the blocking recv failed. We then modified 
>>>>>>>>> the
>>>>>>>>> blocking send/recv so they would handle errors.
>>>>>>>>>
>>>>>>>>> So that problem occurred -after- the server had correctly called
>>>>>>>>> accept. The listener code is in
>>>>>>>>> opal/mca/pmix/pmix1xx/pmix/src/server/pmix_server_listener.c
>>>>>>>>>
>>>>>>>>> It looks to me like the only way we could drop the accept
>>>>>>>>> (assuming the OS doesn’t lose it) is if the file descriptor lies 
>>>>>>>>> outside
>>>>>>>>> the expected range once we fall out of select:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>         /* Spin accepting connections until all active listen
>>>>>>>>> sockets
>>>>>>>>>          * do not have any incoming connections, pushing each
>>>>>>>>> connection
>>>>>>>>>          * onto the event queue for processing
>>>>>>>>>          */
>>>>>>>>>         do {
>>>>>>>>>             accepted_connections = 0;
>>>>>>>>>             /* according to the man pages, select replaces the
>>>>>>>>> given descriptor
>>>>>>>>>              * set with a subset consisting of those descriptors
>>>>>>>>> that are ready
>>>>>>>>>              * for the specified operation - in this case, a read.
>>>>>>>>> So we need to
>>>>>>>>>              * first check to see if this file descriptor is
>>>>>>>>> included in the
>>>>>>>>>              * returned subset
>>>>>>>>>              */
>>>>>>>>>             if (0 == FD_ISSET(pmix_server_globals.listen_socket,
>>>>>>>>> &readfds)) {
>>>>>>>>>                 /* this descriptor is not included */
>>>>>>>>>                 continue;
>>>>>>>>>             }
>>>>>>>>>
>>>>>>>>>             /* this descriptor is ready to be read, which means a
>>>>>>>>> connection
>>>>>>>>>              * request has been received - so harvest it. All we
>>>>>>>>> want to do
>>>>>>>>>              * here is accept the connection and push the info
>>>>>>>>> onto the event
>>>>>>>>>              * library for subsequent processing - we don't want
>>>>>>>>> to actually
>>>>>>>>>              * process the connection here as it takes too long,
>>>>>>>>> and so the
>>>>>>>>>              * OS might start rejecting connections due to timeout.
>>>>>>>>>              */
>>>>>>>>>             pending_connection =
>>>>>>>>> PMIX_NEW(pmix_pending_connection_t);
>>>>>>>>>             event_assign(&pending_connection->ev,
>>>>>>>>> pmix_globals.evbase, -1,
>>>>>>>>>                          EV_WRITE, connection_handler,
>>>>>>>>> pending_connection);
>>>>>>>>>             pending_connection->sd =
>>>>>>>>> accept(pmix_server_globals.listen_socket,
>>>>>>>>>                                             (struct
>>>>>>>>> sockaddr*)&(pending_connection->addr),
>>>>>>>>>                                             &addrlen);
>>>>>>>>>             if (pending_connection->sd < 0) {
>>>>>>>>>                 PMIX_RELEASE(pending_connection);
>>>>>>>>>                 if (pmix_socket_errno != EAGAIN ||
>>>>>>>>>                     pmix_socket_errno != EWOULDBLOCK) {
>>>>>>>>>                     if (EMFILE == pmix_socket_errno) {
>>>>>>>>>                         PMIX_ERROR_LOG(PMIX_ERR_OUT_OF_RESOURCE);
>>>>>>>>>                     } else {
>>>>>>>>>                         pmix_output(0, "listen_thread: accept()
>>>>>>>>> failed: %s (%d).",
>>>>>>>>>                                     strerror(pmix_socket_errno),
>>>>>>>>> pmix_socket_errno);
>>>>>>>>>                     }
>>>>>>>>>                     goto done;
>>>>>>>>>                 }
>>>>>>>>>                 continue;
>>>>>>>>>             }
>>>>>>>>>
>>>>>>>>>             pmix_output_verbose(8, pmix_globals.debug_output,
>>>>>>>>>                                 "listen_thread: new connection:
>>>>>>>>> (%d, %d)",
>>>>>>>>>                                 pending_connection->sd,
>>>>>>>>> pmix_socket_errno);
>>>>>>>>>             /* activate the event */
>>>>>>>>>             event_active(&pending_connection->ev, EV_WRITE, 1);
>>>>>>>>>             accepted_connections++;
>>>>>>>>>         } while (accepted_connections > 0);
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Oct 28, 2015, at 12:25 AM, Ralph Castain <[email protected]>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> Looking at the code, it appears that a fix was committed for this
>>>>>>>>> problem, and that we correctly resolved the issue found by Paul. The
>>>>>>>>> problem is that the fix didn’t get upstreamed, and so it was lost the 
>>>>>>>>> next
>>>>>>>>> time we refreshed PMIx. Sigh.
>>>>>>>>>
>>>>>>>>> Let me try to recreate the fix and have you take a gander at it.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Oct 28, 2015, at 12:22 AM, Ralph Castain <[email protected]>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> Here is the discussion - afraid it is fairly lengthy. Ignore the
>>>>>>>>> hwloc references in it as that was a separate issue:
>>>>>>>>>
>>>>>>>>> http://www.open-mpi.org/community/lists/devel/2015/09/18074.php
>>>>>>>>>
>>>>>>>>> It definitely sounds like the same issue creeping in again. I’d
>>>>>>>>> appreciate any thoughts on how to correct it. If it helps, you could 
>>>>>>>>> look
>>>>>>>>> at the PMIx master - there are standalone tests in the test/simple
>>>>>>>>> directory that fork/exec a child and just do the connection.
>>>>>>>>>
>>>>>>>>> https://github.com/pmix/master
>>>>>>>>>
>>>>>>>>> The test server is simptest.c - it will spawn a single copy of
>>>>>>>>> simpclient.c by default.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Oct 27, 2015, at 10:14 PM, George Bosilca <[email protected]>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> Interesting. Do you have a pointer to the commit (or/and to the
>>>>>>>>> discussion)?
>>>>>>>>>
>>>>>>>>> I looked at the PMIX code, and I have identified few issues, but
>>>>>>>>> unfortunately none of them seem to fix the problem for good. However, 
>>>>>>>>> now I
>>>>>>>>> need more than 1000 runs to get a deadlock (instead of few tens).
>>>>>>>>>
>>>>>>>>> Looking with "netstat -ax" at the status of the UDS while the
>>>>>>>>> processes are deadlocked, I see 2 UDS with the same name: one from the
>>>>>>>>> server which is in LISTEN state, and one for the client which is 
>>>>>>>>> being in
>>>>>>>>> CONNECTING state (while the client already sent a message in the 
>>>>>>>>> socket and
>>>>>>>>> is now waiting in a blocking receive). This somehow suggest that the 
>>>>>>>>> server
>>>>>>>>> has not yet called accept on the UDS. Unfortunately, there are 3 
>>>>>>>>> threads
>>>>>>>>> all doing different flavors of even_base and select, so I have a hard 
>>>>>>>>> time
>>>>>>>>> tracking the path of the UDS on the server side.
>>>>>>>>>
>>>>>>>>> So in order to validate my assumption I wrote a minimalistic UDS
>>>>>>>>> client and server application and tried different scenarios. The 
>>>>>>>>> conclusion
>>>>>>>>> is that in order to see the same type of output from "netstat -ax" I 
>>>>>>>>> have
>>>>>>>>> to call listen on the server, connect on the client and do not call 
>>>>>>>>> accept
>>>>>>>>> on the server.
>>>>>>>>>
>>>>>>>>> With the same occasion I also confirmed that the UDS are holding
>>>>>>>>> the data sent so there is no need for further synchronization for the 
>>>>>>>>> case
>>>>>>>>> where the data is sent first. We only need to find out how the server
>>>>>>>>> forgets to call accept.
>>>>>>>>>
>>>>>>>>>   George.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Oct 27, 2015 at 7:52 PM, Ralph Castain <[email protected]>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hmmm…this looks like it might be that problem we previously saw
>>>>>>>>>> where the blocking recv hangs in a proc when the blocking send tries 
>>>>>>>>>> to
>>>>>>>>>> send before the domain socket is actually ready, and so the send 
>>>>>>>>>> fails on
>>>>>>>>>> the other end. As I recall, it was something to do with the 
>>>>>>>>>> socketoptions -
>>>>>>>>>> and then Paul had a problem on some of his machines, and we backed 
>>>>>>>>>> it out?
>>>>>>>>>>
>>>>>>>>>> I wonder if that’s what is biting us here again, and what we need
>>>>>>>>>> is to either remove the blocking send/recv’s altogether, or figure 
>>>>>>>>>> out a
>>>>>>>>>> way to wait until the socket is really ready.
>>>>>>>>>>
>>>>>>>>>> Any thoughts?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Oct 27, 2015, at 4:11 PM, George Bosilca <[email protected]>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> It appear the branch solve the problem at least partially. I
>>>>>>>>>> asked one of my students to hammer it pretty badly, and he reported 
>>>>>>>>>> that
>>>>>>>>>> the deadlocks still occur. He also graciously provided some 
>>>>>>>>>> stacktraces:
>>>>>>>>>>
>>>>>>>>>> #0  0x00007f4bd5274aed in nanosleep () from /lib64/libc.so.6
>>>>>>>>>> #1  0x00007f4bd52a9c94 in usleep () from /lib64/libc.so.6
>>>>>>>>>> #2  0x00007f4bd2e42b00 in OPAL_PMIX_PMIX1XX_PMIx_Fence
>>>>>>>>>> (procs=0x0, nprocs=0, info=0x7fff3c561960,
>>>>>>>>>>     ninfo=1) at src/client/pmix_client_fence.c:100
>>>>>>>>>> #3  0x00007f4bd306e6d2 in pmix1_fence (procs=0x0, collect_data=1)
>>>>>>>>>> at pmix1_client.c:306
>>>>>>>>>> #4  0x00007f4bd57d5cc3 in ompi_mpi_init (argc=3,
>>>>>>>>>> argv=0x7fff3c561ea8, requested=3,
>>>>>>>>>>     provided=0x7fff3c561d84) at runtime/ompi_mpi_init.c:644
>>>>>>>>>> #5  0x00007f4bd5813399 in PMPI_Init_thread (argc=0x7fff3c561d7c,
>>>>>>>>>> argv=0x7fff3c561d70, required=3,
>>>>>>>>>>     provided=0x7fff3c561d84) at pinit_thread.c:69
>>>>>>>>>> #6  0x0000000000401516 in main (argc=3, argv=0x7fff3c561ea8) at
>>>>>>>>>> osu_mbw_mr.c:86
>>>>>>>>>>
>>>>>>>>>> And another process:
>>>>>>>>>>
>>>>>>>>>> #0  0x00007f7b9d7d8bdc in recv () from /lib64/libpthread.so.0
>>>>>>>>>> #1  0x00007f7b9b0aa42d in
>>>>>>>>>> opal_pmix_pmix1xx_pmix_usock_recv_blocking (sd=13, 
>>>>>>>>>> data=0x7ffd62139004 "",
>>>>>>>>>>     size=4) at src/usock/usock.c:168
>>>>>>>>>> #2  0x00007f7b9b0af5d9 in recv_connect_ack (sd=13) at
>>>>>>>>>> src/client/pmix_client.c:844
>>>>>>>>>> #3  0x00007f7b9b0b085e in usock_connect (addr=0x7ffd62139330) at
>>>>>>>>>> src/client/pmix_client.c:1110
>>>>>>>>>> #4  0x00007f7b9b0acc24 in connect_to_server
>>>>>>>>>> (address=0x7ffd62139330, cbdata=0x7ffd621390e0)
>>>>>>>>>>     at src/client/pmix_client.c:181
>>>>>>>>>> #5  0x00007f7b9b0ad569 in OPAL_PMIX_PMIX1XX_PMIx_Init
>>>>>>>>>> (proc=0x7f7b9b4e9b60)
>>>>>>>>>>     at src/client/pmix_client.c:362
>>>>>>>>>> #6  0x00007f7b9b2dbd9d in pmix1_client_init () at
>>>>>>>>>> pmix1_client.c:99
>>>>>>>>>> #7  0x00007f7b9b4eb95f in pmi_component_query
>>>>>>>>>> (module=0x7ffd62139490, priority=0x7ffd6213948c)
>>>>>>>>>>     at ess_pmi_component.c:90
>>>>>>>>>> #8  0x00007f7b9ce70ec5 in mca_base_select
>>>>>>>>>> (type_name=0x7f7b9d20e059 "ess", output_id=-1,
>>>>>>>>>>     components_available=0x7f7b9d431eb0,
>>>>>>>>>> best_module=0x7ffd621394d0, best_component=0x7ffd621394d8,
>>>>>>>>>>     priority_out=0x0) at mca_base_components_select.c:77
>>>>>>>>>> #9  0x00007f7b9d1a956b in orte_ess_base_select () at
>>>>>>>>>> base/ess_base_select.c:40
>>>>>>>>>> #10 0x00007f7b9d160449 in orte_init (pargc=0x0, pargv=0x0,
>>>>>>>>>> flags=32) at runtime/orte_init.c:219
>>>>>>>>>> #11 0x00007f7b9da4377a in ompi_mpi_init (argc=3,
>>>>>>>>>> argv=0x7ffd621397f8, requested=3,
>>>>>>>>>>     provided=0x7ffd621396d4) at runtime/ompi_mpi_init.c:488
>>>>>>>>>> #12 0x00007f7b9da81399 in PMPI_Init_thread (argc=0x7ffd621396cc,
>>>>>>>>>> argv=0x7ffd621396c0, required=3,
>>>>>>>>>>     provided=0x7ffd621396d4) at pinit_thread.c:69
>>>>>>>>>> #13 0x0000000000401516 in main (argc=3, argv=0x7ffd621397f8) at
>>>>>>>>>> osu_mbw_mr.c:86
>>>>>>>>>>
>>>>>>>>>>   George.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tue, Oct 27, 2015 at 2:36 PM, Ralph Castain <[email protected]>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> I haven’t been able to replicate this when using the branch in
>>>>>>>>>>> this PR:
>>>>>>>>>>>
>>>>>>>>>>> https://github.com/open-mpi/ompi/pull/1073
>>>>>>>>>>>
>>>>>>>>>>> Would you mind giving it a try? It fixes some other race
>>>>>>>>>>> conditions and might pick this one up too.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Oct 27, 2015, at 10:04 AM, Ralph Castain <[email protected]>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Okay, I’ll take a look - I’ve been chasing a race condition that
>>>>>>>>>>> might be related
>>>>>>>>>>>
>>>>>>>>>>> On Oct 27, 2015, at 9:54 AM, George Bosilca <[email protected]>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>> No, it's using 2 nodes.
>>>>>>>>>>>   George.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Oct 27, 2015 at 12:35 PM, Ralph Castain <
>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Is this on a single node?
>>>>>>>>>>>>
>>>>>>>>>>>> On Oct 27, 2015, at 9:25 AM, George Bosilca <
>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> I get intermittent deadlocks wit the latest trunk. The smallest
>>>>>>>>>>>> reproducer is a shell for loop around a small (2 processes) short 
>>>>>>>>>>>> (20
>>>>>>>>>>>> seconds) MPI application. After few tens of iterations the 
>>>>>>>>>>>> MPI_Init will
>>>>>>>>>>>> deadlock with the following backtrace:
>>>>>>>>>>>>
>>>>>>>>>>>> #0  0x00007fa94b5d9aed in nanosleep () from /lib64/libc.so.6
>>>>>>>>>>>> #1  0x00007fa94b60ec94 in usleep () from /lib64/libc.so.6
>>>>>>>>>>>> #2  0x00007fa94960ba08 in OPAL_PMIX_PMIX1XX_PMIx_Fence
>>>>>>>>>>>> (procs=0x0, nprocs=0, info=0x7ffd7934fb90,
>>>>>>>>>>>>     ninfo=1) at src/client/pmix_client_fence.c:100
>>>>>>>>>>>> #3  0x00007fa9498376a2 in pmix1_fence (procs=0x0,
>>>>>>>>>>>> collect_data=1) at pmix1_client.c:305
>>>>>>>>>>>> #4  0x00007fa94bb39ba4 in ompi_mpi_init (argc=3,
>>>>>>>>>>>> argv=0x7ffd793500a8, requested=3,
>>>>>>>>>>>>     provided=0x7ffd7934ff94) at runtime/ompi_mpi_init.c:645
>>>>>>>>>>>> #5  0x00007fa94bb77281 in PMPI_Init_thread
>>>>>>>>>>>> (argc=0x7ffd7934ff8c, argv=0x7ffd7934ff80, required=3,
>>>>>>>>>>>>     provided=0x7ffd7934ff94) at pinit_thread.c:69
>>>>>>>>>>>> #6  0x000000000040150f in main (argc=3, argv=0x7ffd793500a8) at
>>>>>>>>>>>> osu_mbw_mr.c:86
>>>>>>>>>>>>
>>>>>>>>>>>> On my machines this is reproducible at 100% after anywhere
>>>>>>>>>>>> between 50 and 100 iterations.
>>>>>>>>>>>>
>>>>>>>>>>>>   Thanks,
>>>>>>>>>>>>     George.
>>>>>>>>>>>>
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> devel mailing list
>>>>>>>>>>>> [email protected]
>>>>>>>>>>>> Subscription:
>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>>>>>> Link to this post:
>>>>>>>>>>>> http://www.open-mpi.org/community/lists/devel/2015/10/18280.php
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> devel mailing list
>>>>>>>>>>>> [email protected]
>>>>>>>>>>>> Subscription:
>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>>>>>> Link to this post:
>>>>>>>>>>>> http://www.open-mpi.org/community/lists/devel/2015/10/18281.php
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> devel mailing list
>>>>>>>>>>> [email protected]
>>>>>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>>>>> Link to this post:
>>>>>>>>>>> http://www.open-mpi.org/community/lists/devel/2015/10/18282.php
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> devel mailing list
>>>>>>>>>>> [email protected]
>>>>>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>>>>> Link to this post:
>>>>>>>>>>> http://www.open-mpi.org/community/lists/devel/2015/10/18284.php
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> devel mailing list
>>>>>>>>>> [email protected]
>>>>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>>>> Link to this post:
>>>>>>>>>> http://www.open-mpi.org/community/lists/devel/2015/10/18292.php
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> devel mailing list
>>>>>>>>>> [email protected]
>>>>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>>>> Link to this post:
>>>>>>>>>> http://www.open-mpi.org/community/lists/devel/2015/10/18294.php
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> devel mailing list
>>>>>>>>> [email protected]
>>>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>>> Link to this post:
>>>>>>>>> http://www.open-mpi.org/community/lists/devel/2015/10/18302.php
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> devel mailing list
>>>>>>>>> [email protected]
>>>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>>> Link to this post:
>>>>>>>>> http://www.open-mpi.org/community/lists/devel/2015/10/18309.php
>>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> devel mailing list
>>>>>>>> [email protected]
>>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>> Link to this post:
>>>>>>>> http://www.open-mpi.org/community/lists/devel/2015/10/18320.php
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> devel mailing list
>>>>>>>> [email protected]
>>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>> Link to this post:
>>>>>>>> http://www.open-mpi.org/community/lists/devel/2015/10/18323.php
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> С Уважением, Поляков Артем Юрьевич
>>>>>>> Best regards, Artem Y. Polyakov
>>>>>>> _______________________________________________
>>>>>>> devel mailing list
>>>>>>> [email protected]
>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>> Link to this post:
>>>>>>> http://www.open-mpi.org/community/lists/devel/2015/11/18334.php
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> devel mailing list
>>>>>>> [email protected]
>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>> Link to this post:
>>>>>>> http://www.open-mpi.org/community/lists/devel/2015/11/18335.php
>>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> devel mailing list
>>>>>> [email protected]
>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>> Link to this post:
>>>>>> http://www.open-mpi.org/community/lists/devel/2015/11/18336.php
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> devel mailing list
>>>>>> [email protected]
>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>> Link to this post:
>>>>>> http://www.open-mpi.org/community/lists/devel/2015/11/18337.php
>>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> devel mailing list
>>>>> [email protected]
>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>> Link to this post:
>>>>> http://www.open-mpi.org/community/lists/devel/2015/11/18340.php
>>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> devel mailing list
>>>> [email protected]
>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>> Link to this post:
>>>> http://www.open-mpi.org/community/lists/devel/2015/11/18341.php
>>>>
>>>
>>>
>>>
>>> --
>>> С Уважением, Поляков Артем Юрьевич
>>> Best regards, Artem Y. Polyakov
>>>
>>
>>
>>
>> --
>> С Уважением, Поляков Артем Юрьевич
>> Best regards, Artem Y. Polyakov
>>
>> _______________________________________________
>> devel mailing list
>> [email protected]
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post:
>> http://www.open-mpi.org/community/lists/devel/2015/11/18345.php
>>
>
> _______________________________________________
> devel mailing list
> [email protected]
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2015/11/18346.php
>
>
>
> _______________________________________________
> devel mailing list
> [email protected]
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2015/11/18347.php
>

Re: [OMPI devel] PMIX deadlock

Reply via email to