Hi Jeff, I tried to test the latest hg tree but it failes from time to time
it happens on different machines with different errors ( see attached file ) It also failes when ib0 is set to slave mode due to bonding, but I am sure that it happens "by design". Lenny. On 9/29/08, Jeff Squyres <jsquy...@cisco.com> wrote: > > Annnnddd.... the pendulum swings back the other way now. :-) > > See the ticket for details: https://svn.open-mpi.org/trac/ompi/ticket/1540 > > Short version: OMPI now just "figures it out" and does the right thing. > > > On Sep 28, 2008, at 7:27 AM, Jeff Squyres wrote: > > Actually, I thought about this one more, and I have concluded that we do >> *not* want to do this (i.e., allow RDMA CM to send requests for port A from >> port B. If we do this, then it would be possible that *all* traffic will go >> the "wrong" way. More specifically, OMPI will not have direct control over >> what traffic goes over what port -- and that would be Bad. >> >> So we'll still lookup the peer based on the address where the connect >> request came from, and I'll eventually add a FAQ item about it (because IP >> addressing is much more flexible than IB addressing, and netadmins may be >> tempted to use a "flat" address space). >> >> >> >> On Sep 26, 2008, at 5:53 PM, Jeff Squyres wrote: >> >> On Sep 26, 2008, at 5:45 PM, Jeff Squyres wrote: >>> >>> I actually spent all afternoon diagnosing something that I'll turn into >>>> a FAQ entry (OMPI's RDMA CM TCP addressing requirements are stronger than >>>> TCP's legal addressing rules). In short, OMPI needs the RDMA CM to >>>> guarantee that requests to connect to port A come from port A. If you have >>>> a "flat" network address space, RDMA CM may actually issue a connect >>>> request >>>> for port A from port B. This causes OMPI to get confused because it will >>>> not find the right BTL openib endpoint to connect to. >>>> >>> >>> >>> And... crap. We can fix this one, too. >>> >>> Right now, we use the IP address from the incoming RDMA CM event ID to >>> determine who the caller is. But we could easily embed the IP address >>> (i.e., endpoint designator) in the private data in the event so that the >>> peer can look at *that* address to identify who the peer is (rather than the >>> address embedded in the event ID). >>> >>> This is actually what the IB CM CPC does, IIRC. >>> >>> Blah. This is also not hard, but it's another task for later. :-) >>> >>> -- >>> Jeff Squyres >>> Cisco Systems >>> >>> >> >> -- >> Jeff Squyres >> Cisco Systems >> >> > > -- > Jeff Squyres > Cisco Systems > >
rdma_cm_error.log
Description: Binary data