Hi Jeff,

I tried to test the latest hg tree but it failes from time to time

it happens on different machines with different errors ( see attached file )

It also failes when ib0 is set to slave mode due to bonding, but I am sure
that it happens "by design".

Lenny.

On 9/29/08, Jeff Squyres <jsquy...@cisco.com> wrote:
>
> Annnnddd.... the pendulum swings back the other way now.  :-)
>
> See the ticket for details: https://svn.open-mpi.org/trac/ompi/ticket/1540
>
> Short version: OMPI now just "figures it out" and does the right thing.
>
>
> On Sep 28, 2008, at 7:27 AM, Jeff Squyres wrote:
>
>  Actually, I thought about this one more, and I have concluded that we do
>> *not* want to do this (i.e., allow RDMA CM to send requests for port A from
>> port B.  If we do this, then it would be possible that *all* traffic will go
>> the "wrong" way.  More specifically, OMPI will not have direct control over
>> what traffic goes over what port -- and that would be Bad.
>>
>> So we'll still lookup the peer based on the address where the connect
>> request came from, and I'll eventually add a FAQ item about it (because IP
>> addressing is much more flexible than IB addressing, and netadmins may be
>> tempted to use a "flat" address space).
>>
>>
>>
>> On Sep 26, 2008, at 5:53 PM, Jeff Squyres wrote:
>>
>>  On Sep 26, 2008, at 5:45 PM, Jeff Squyres wrote:
>>>
>>>  I actually spent all afternoon diagnosing something that I'll turn into
>>>> a FAQ entry (OMPI's RDMA CM TCP addressing requirements are stronger than
>>>> TCP's legal addressing rules).  In short, OMPI needs the RDMA CM to
>>>> guarantee that requests to connect to port A come from port A.  If you have
>>>> a "flat" network address space, RDMA CM may actually issue a connect 
>>>> request
>>>> for port A from port B.  This causes OMPI to get confused because it will
>>>> not find the right BTL openib endpoint to connect to.
>>>>
>>>
>>>
>>> And... crap.  We can fix this one, too.
>>>
>>> Right now, we use the IP address from the incoming RDMA CM event ID to
>>> determine who the caller is.  But we could easily embed the IP address
>>> (i.e., endpoint designator) in the private data in the event so that the
>>> peer can look at *that* address to identify who the peer is (rather than the
>>> address embedded in the event ID).
>>>
>>> This is actually what the IB CM CPC does, IIRC.
>>>
>>> Blah.  This is also not hard, but it's another task for later.  :-)
>>>
>>> --
>>> Jeff Squyres
>>> Cisco Systems
>>>
>>>
>>
>> --
>> Jeff Squyres
>> Cisco Systems
>>
>>
>
> --
> Jeff Squyres
> Cisco Systems
>
>

Attachment: rdma_cm_error.log
Description: Binary data

Reply via email to