Do you have a reproducer you can share for testing this? I'm unable to get it 
to happen on my machine, but maybe you have a test code that triggers it so I 
can continue debugging

Ralph

On Sep 17, 2014, at 4:07 AM, Gilles Gouaillardet 
<gilles.gouaillar...@iferc.org> wrote:

> Thanks Ralph,
> 
> this is much better but there is still a bug :
> with the very same scenario i described earlier, vpid 2 does not send
> its message to vpid 3 once the connection has been established.
> 
> i tried to debug it but i have been pretty unsuccessful so far ..
> 
> vpid 2 calls tcp_peer_connected and execute the following snippet
> 
> if (NULL != peer->send_msg && !peer->send_ev_active) {
>        opal_event_add(&peer->send_event, 0);
>        peer->send_ev_active = true;
>    }
> 
> but when evmap_io_active is invoked later, the following part :
> 
>    TAILQ_FOREACH(ev, &ctx->events, ev_io_next) {
>        if (ev->ev_events & events)
>            event_active_nolock(ev, ev->ev_events & events, 1);
>    }
> 
> finds only one ev (mca_oob_tcp_recv_handler and *no*
> mca_oob_tcp_send_handler)
> 
> i will resume my investigations tomorrow
> 
> Cheers,
> 
> Gilles
> 
> On 2014/09/17 4:01, Ralph Castain wrote:
>> Hi Gilles
>> 
>> I took a crack at solving this in r32744 - CMRd it for 1.8.3 and assigned it 
>> to you for review. Give it a try and let me know if I (hopefully) got it.
>> 
>> The approach we have used in the past is to have both sides close their 
>> connections, and then have the higher vpid retry while the lower one waits. 
>> The logic for that was still in place, but it looks like you are hitting a 
>> different code path, and I found another potential one as well. So I think I 
>> plugged the holes, but will wait to hear if you confirm.
>> 
>> Thanks
>> Ralph
>> 
>> On Sep 16, 2014, at 6:27 AM, Gilles Gouaillardet 
>> <gilles.gouaillar...@gmail.com> wrote:
>> 
>>> Ralph,
>>> 
>>> here is the full description of a race condition in oob/tcp i very briefly 
>>> mentionned in a previous post :
>>> 
>>> the race condition can occur when two not connected orted try to send a 
>>> message to each other for the first time and at the same time.
>>> 
>>> that can occur when running mpi helloworld on 4 nodes with the grpcomm/rcd 
>>> module.
>>> 
>>> here is a scenario in which the race condition occurs :
>>> 
>>> orted vpid 2 and 3 enter the allgather
>>> /* they are not orte yet oob/tcp connected*/
>>> and they call orte.send_buffer_nb each other.
>>> from a libevent point of view, vpid 2 and 3 will call 
>>> mca_oob_tcp_peer_try_connect
>>> 
>>> vpid 2 calls mca_oob_tcp_send_handler
>>> 
>>> vpid 3 calls connection_event_handler
>>> 
>>> depending on the value returned by random() in libevent, vpid 3 will
>>> either call mca_oob_tcp_send_handler (likely) or recv_handler (unlikely)
>>> if vpid 3 calls recv_handler, it will close the two sockets to vpid 2
>>> 
>>> then vpid 2 will call mca_oob_tcp_recv_handler
>>> (peer->state is MCA_OOB_TCP_CONNECT_ACK)
>>> that will invoke mca_oob_tcp_recv_connect_ack
>>> tcp_peer_recv_blocking will fail 
>>> /* zero bytes are recv'ed since vpid 3 previously closed the socket before 
>>> writing a header */
>>> and this is handled by mca_oob_tcp_recv_handler as a fatal error
>>> /* ORTE_FORCED_TERMINATE(1) */
>>> 
>>> could you please have a look at it ?
>>> 
>>> if you are too busy, could you please advise where this scenario should be 
>>> handled differently ?
>>> - should vpid 3 keep one socket instead of closing both and retrying ?
>>> - should vpid 2 handle the failure as a non fatal error ?
>>> 
>>> Cheers,
>>> 
>>> Gilles
>>> _______________________________________________
>>> devel mailing list
>>> de...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> Link to this post: 
>>> http://www.open-mpi.org/community/lists/devel/2014/09/15836.php
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2014/09/15844.php
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/09/15854.php

Reply via email to