Do you have a reproducer you can share for testing this? I'm unable to get it to happen on my machine, but maybe you have a test code that triggers it so I can continue debugging
Ralph On Sep 17, 2014, at 4:07 AM, Gilles Gouaillardet <gilles.gouaillar...@iferc.org> wrote: > Thanks Ralph, > > this is much better but there is still a bug : > with the very same scenario i described earlier, vpid 2 does not send > its message to vpid 3 once the connection has been established. > > i tried to debug it but i have been pretty unsuccessful so far .. > > vpid 2 calls tcp_peer_connected and execute the following snippet > > if (NULL != peer->send_msg && !peer->send_ev_active) { > opal_event_add(&peer->send_event, 0); > peer->send_ev_active = true; > } > > but when evmap_io_active is invoked later, the following part : > > TAILQ_FOREACH(ev, &ctx->events, ev_io_next) { > if (ev->ev_events & events) > event_active_nolock(ev, ev->ev_events & events, 1); > } > > finds only one ev (mca_oob_tcp_recv_handler and *no* > mca_oob_tcp_send_handler) > > i will resume my investigations tomorrow > > Cheers, > > Gilles > > On 2014/09/17 4:01, Ralph Castain wrote: >> Hi Gilles >> >> I took a crack at solving this in r32744 - CMRd it for 1.8.3 and assigned it >> to you for review. Give it a try and let me know if I (hopefully) got it. >> >> The approach we have used in the past is to have both sides close their >> connections, and then have the higher vpid retry while the lower one waits. >> The logic for that was still in place, but it looks like you are hitting a >> different code path, and I found another potential one as well. So I think I >> plugged the holes, but will wait to hear if you confirm. >> >> Thanks >> Ralph >> >> On Sep 16, 2014, at 6:27 AM, Gilles Gouaillardet >> <gilles.gouaillar...@gmail.com> wrote: >> >>> Ralph, >>> >>> here is the full description of a race condition in oob/tcp i very briefly >>> mentionned in a previous post : >>> >>> the race condition can occur when two not connected orted try to send a >>> message to each other for the first time and at the same time. >>> >>> that can occur when running mpi helloworld on 4 nodes with the grpcomm/rcd >>> module. >>> >>> here is a scenario in which the race condition occurs : >>> >>> orted vpid 2 and 3 enter the allgather >>> /* they are not orte yet oob/tcp connected*/ >>> and they call orte.send_buffer_nb each other. >>> from a libevent point of view, vpid 2 and 3 will call >>> mca_oob_tcp_peer_try_connect >>> >>> vpid 2 calls mca_oob_tcp_send_handler >>> >>> vpid 3 calls connection_event_handler >>> >>> depending on the value returned by random() in libevent, vpid 3 will >>> either call mca_oob_tcp_send_handler (likely) or recv_handler (unlikely) >>> if vpid 3 calls recv_handler, it will close the two sockets to vpid 2 >>> >>> then vpid 2 will call mca_oob_tcp_recv_handler >>> (peer->state is MCA_OOB_TCP_CONNECT_ACK) >>> that will invoke mca_oob_tcp_recv_connect_ack >>> tcp_peer_recv_blocking will fail >>> /* zero bytes are recv'ed since vpid 3 previously closed the socket before >>> writing a header */ >>> and this is handled by mca_oob_tcp_recv_handler as a fatal error >>> /* ORTE_FORCED_TERMINATE(1) */ >>> >>> could you please have a look at it ? >>> >>> if you are too busy, could you please advise where this scenario should be >>> handled differently ? >>> - should vpid 3 keep one socket instead of closing both and retrying ? >>> - should vpid 2 handle the failure as a non fatal error ? >>> >>> Cheers, >>> >>> Gilles >>> _______________________________________________ >>> devel mailing list >>> de...@open-mpi.org >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> Link to this post: >>> http://www.open-mpi.org/community/lists/devel/2014/09/15836.php >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >> Link to this post: >> http://www.open-mpi.org/community/lists/devel/2014/09/15844.php > > _______________________________________________ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/09/15854.php