Ralph, i just commited r32799 in order to fix this issue. i cmr'ed (#4923) and set the target for 1.8.4
Cheers, Gilles On 2014/09/23 22:55, Ralph Castain wrote: > Thanks! I won't have time to work on it this week, but appreciate your > effort. Also, thanks for clarifying the race condition vis 1.8 - I agree it > is not a blocker for that release. > > Ralph > > On Sep 22, 2014, at 4:49 PM, Gilles Gouaillardet > <gilles.gouaillar...@gmail.com> wrote: > >> Ralph, >> >> here is the patch i am using so far. >> i will resume working on this from Wednesday (there is at least one >> remaining race condition yet) unless you have the time to take care of it >> today. >> >> so far, the race condition has only been observed in real life with the >> grpcomm/rcd module, and this is not the default in v1.8, so imho this is not >> a blocker for v1.8.3 >> >> Cheers, >> >> Gilles >> >> On Tue, Sep 23, 2014 at 7:46 AM, Ralph Castain <r...@open-mpi.org> wrote: >> Gilles - please let me know if/when you think you'll do this. I'm debating >> about adding it to 1.8.3, but don't want to delay that release too long. >> Alternatively, I can take care of it if you don't have time (I'm asking if >> you can do it solely because you have the reproducer). >> >> >> On Sep 21, 2014, at 6:54 AM, Ralph Castain <r...@open-mpi.org> wrote: >> >>> Sounds fine with me - please go ahead, and thanks >>> >>> On Sep 20, 2014, at 10:26 PM, Gilles Gouaillardet >>> <gilles.gouaillar...@gmail.com> wrote: >>> >>>> Thanks for the pointer George ! >>>> >>>> On Sat, Sep 20, 2014 at 5:46 AM, George Bosilca <bosi...@icl.utk.edu> >>>> wrote: >>>> Or copy the handshake protocol design of the TCP BTL... >>>> >>>> >>>> the main difference between oob/tcp and btl/tcp is the way we resolve the >>>> situation in which two processes send their first message to each other at >>>> the same time. >>>> >>>> in oob/tcp, all (e.g. one or two) sockets are closed and the higher vpid >>>> is directed to retry establishing a connection. >>>> >>>> in btl/tcp, the useless socket is closed (e.g. the one that was connect-ed >>>> on the lower vpid and the one that was accept-ed on the higher vpid. >>>> >>>> >>>> my first impression is that oob/tcp is un-necessary complex and it should >>>> use the simpler and most efficient protocol of btl/tcp. >>>> that being said, this conclusion could be too naive and for some good >>>> reasons i ignore, the btl/tcp handshake protocol might not be a good fit >>>> for oob/tcp. >>>> >>>> any thoughts ? >>>> >>>> i will revamp oob/tcp in order to use the same btl/tcp handshake protocol >>>> from tomorrow unless indicated otherwise >>>> >>>> Cheers, >>>> >>>> Gilles >>>> _______________________________________________ >>>> devel mailing list >>>> de...@open-mpi.org >>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>> Link to this post: >>>> http://www.open-mpi.org/community/lists/devel/2014/09/15885.php >> >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >> Link to this post: >> http://www.open-mpi.org/community/lists/devel/2014/09/15895.php >> >> <oobtcp2.patch>_______________________________________________ >> devel mailing list >> de...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >> Link to this post: >> http://www.open-mpi.org/community/lists/devel/2014/09/15897.php > > > > _______________________________________________ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/09/15900.php