Ralph,

i just commited r32799 in order to fix this issue.
i cmr'ed (#4923) and set the target for 1.8.4

Cheers,

Gilles

On 2014/09/23 22:55, Ralph Castain wrote:
> Thanks! I won't have time to work on it this week, but appreciate your 
> effort. Also, thanks for clarifying the race condition vis 1.8 - I agree it 
> is not a blocker for that release.
>
> Ralph
>
> On Sep 22, 2014, at 4:49 PM, Gilles Gouaillardet 
> <gilles.gouaillar...@gmail.com> wrote:
>
>> Ralph,
>>
>> here is the patch i am using so far.
>> i will resume working on this from Wednesday (there is at least one 
>> remaining race condition yet) unless you have the time to take care of it 
>> today.
>>
>> so far, the race condition has only been observed in real life with the 
>> grpcomm/rcd module, and this is not the default in v1.8, so imho this is not 
>> a blocker for v1.8.3
>>
>> Cheers,
>>
>> Gilles
>>
>> On Tue, Sep 23, 2014 at 7:46 AM, Ralph Castain <r...@open-mpi.org> wrote:
>> Gilles - please let me know if/when you think you'll do this. I'm debating 
>> about adding it to 1.8.3, but don't want to delay that release too long. 
>> Alternatively, I can take care of it if you don't have time (I'm asking if 
>> you can do it solely because you have the reproducer).
>>
>>
>> On Sep 21, 2014, at 6:54 AM, Ralph Castain <r...@open-mpi.org> wrote:
>>
>>> Sounds fine with me - please go ahead, and thanks
>>>
>>> On Sep 20, 2014, at 10:26 PM, Gilles Gouaillardet 
>>> <gilles.gouaillar...@gmail.com> wrote:
>>>
>>>> Thanks for the pointer George !
>>>>
>>>> On Sat, Sep 20, 2014 at 5:46 AM, George Bosilca <bosi...@icl.utk.edu> 
>>>> wrote:
>>>> Or copy the handshake protocol design of the TCP BTL...
>>>>
>>>>
>>>> the main difference between oob/tcp and btl/tcp is the way we resolve the 
>>>> situation in which two processes send their first message to each other at 
>>>> the same time.
>>>>
>>>> in oob/tcp, all (e.g. one or two) sockets are closed and the higher vpid 
>>>> is directed to retry establishing a connection.
>>>>
>>>> in btl/tcp, the useless socket is closed (e.g. the one that was connect-ed 
>>>> on the lower vpid and the one that was accept-ed on the higher vpid.
>>>>
>>>>
>>>> my first impression is that oob/tcp is un-necessary complex and it should 
>>>> use the simpler and most efficient protocol of btl/tcp.
>>>> that being said, this conclusion could be too naive and for some good 
>>>> reasons i ignore, the btl/tcp handshake protocol might not be a good fit 
>>>> for oob/tcp.
>>>>
>>>> any thoughts ?
>>>>
>>>> i will revamp oob/tcp in order to use the same btl/tcp handshake protocol 
>>>> from tomorrow unless indicated otherwise
>>>>
>>>> Cheers,
>>>>
>>>> Gilles
>>>> _______________________________________________
>>>> devel mailing list
>>>> de...@open-mpi.org
>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>> Link to this post: 
>>>> http://www.open-mpi.org/community/lists/devel/2014/09/15885.php
>>
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2014/09/15895.php
>>
>> <oobtcp2.patch>_______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2014/09/15897.php
>
>
>
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/09/15900.php

Reply via email to