On Wed, 2007-05-09 at 17:55 -0700, Andrew Friedley wrote: > > Steve Wise wrote: > > On Wed, 2007-05-09 at 16:15 -0700, Andrew Friedley wrote: > >> Steve Wise wrote: > >>> There have been a series of discussions on the ofa general list about > >>> this issue, and the conclusion to date is that it cannot be resolved in > >>> the rdma-cm or iwarp-cm code of the linux rdma stack. Mainly because > >>> sending an RDMA message involves the ULP's work queue and completion > >>> queue, so the CM cannot do this under the covers in a mannor that > >>> doesn't affect the application. Thus, the applications must deal with > >>> this. > >> Why can't uDAPL deal with this? As a uDAPL user, I really don't care > >> what API uDAPL is using under the hood to move data from one place to > >> another, nor the quirks of that API. The whole point of uDAPL is to > >> form a network-agnostic abstraction layer. AFAIK, the uDAPL spec > >> doesn't enforce any such requirement on RDMA communication either. In > >> my opinion, exposing such behavior above uDAPL is incorrect and is part > >> of why uDAPL has seen limited adoption -- every single uDAPL > >> implementation behaves in different ways, making it extremely difficult > >> to write an application to work on any uDAPL implementation. Sorry if > >> this sounds harsh, but this comes from many hours of banging my head on > >> the wall due to working around these sorts of problems :) > >> > > > > I understand your frustration. I think the MPA protocol is deficient in > > this respect and should have required the necessary "first FPDU" to be > > sent under the covers by the RNICs. A RTR packet if you will. To > > resolve this issue "properly", in my opinion, would involve changing the > > IETF MPA spec and also breaking all the existing iwarp HW. We can't do > > that. > > Understood. > > > The reason it is hard or impossible to solve this in the DAPL layer is > > that any rdma operation on the QP affects the state of that QP and the > > associate CQs. In addition, if you use an RDMA send to enforce this you > > impact the other side by consuming a RECV buffer. So its hard if not > > impossible to do this under the covers without affecting the > > application's resources. > > Is there no way to do this before passing connection established events > to the uDAPL consumer? I need to go read up on the uDAPL API to really > understand why this wouldn't work. >
Perhaps the dapl or maybe even a OFA iWARP CM could defer passing up the "established" event on the passive side until an incoming SEND is detected. I know we've discussed this before, but I'm not sure why this was not a workable solution. Perhaps Caitlin or some iwarp folks can recall? > > > > Also, the DAPL specification had a goal to not impose any additional > > protocol on the wire. If you add this under the covers, then you add > > such a "protocol" and break interoperability between a connection > > accessed via DAPL on one end and some other API on the other end. > > So I guess there's no 'right' solution, at least at the uDAPL level. > With RDMACM/OFA verbs, there's at least the argument that you can design > the API/semantics however you please, while uDAPL is already standardized. Yes, but its still difficult to post a SEND under the covers because it consumes the application resources in the form of QP and CQ space and a RECV buffer. So to date, we have...punted and pushed to problem to the ULP. > > I hope you guys are documenting this in a way that makes this issue > extremely clear to both uDAPL and OFA verbs (is this the right naming?) > users. Maybe it's been done already, but is it possible to emit some > sort of loud warning/error when the accept()'ing side tries to send > before a receive? > The connection comes tumbling down. How's that for loud? :) Seriously though, it isn't documented well enough. But we're bleeding edge here. And I'm still hoping somebody will come up with an elegant solution that doesn't break interoperability, applications and/or iwarp hw (i'm a dreamer :). Steve.