Donal Kerr wrote: >>> order of business after connection establishment >>> (mba_btl_udapl_sendrecv(). The RECV buffer post for this exchange, >>> however, should really be done _before_ the >>> dat_ep_connect() on the active side, and _before_ the >>> dat_cr_accept() on the server side. >>> Currently its done after the ESTABLISHED event is dequeued, thus >>> allowing the race condition. >>> >>> I believe the rules are the ULP must ensure that a RECV is posted >>> before the client can post a SEND for that buffer. >>> And further, the ULP must enforce flow control somehow so that a >>> SEND never arrives without a RECV buffer being available. >>> >>> > maybe this is a rule iwarp imposes on its ULPs but not uDAPL. >
It is most assuredly a rule for uDAPL. And it is not a matter of iWARP "imposing" on uDAPL. uDAPL was explicitly designed to support IB, iWARP and VI. To do that DAPL documents its model of what RDMA is. This issue is in fact one that is truly fundamental to the efficiency of RDMA -- the transport layer DOES NOT provide buffering. That's the application's job. It is precisely because the application layer does a better job that RDMA can achieve better performance at high bandwidth. For reasons that have been discussed in more depth in the RDMA applicability statement and in RDDP/IPS discussions on iSER, the absence of transport layer buffer throttling places the onus for end-to-end pacing on the application. It is a situation somewhat akin to a car with a broken spedometer that had previously only driven during rush hour bumper-to-bumper traffic. The fact that the spedometer was broken was irrelevant. But if that same car hits the open road the driver will need to come up with some method of regulating their speed. The DAPL semantics are very clear that send/recv operations must be matched one to one, that the receive buffer must be large enough for the received message and that there must be a receive buffer for each incoming send/recv message. That means that the sender needs to have some basis for believing that the RECV has been posted. Usually this is an explicit credit that is decremented per message and incremented per response. What DAPL does not state is if the transport does explicit flow control so that the sending application's work request is simply not processed (and the sending application continues to provide the buffer, as with InfiniBand) or whether the sender simply transmits and leaves error detection to the receiver (iWARP). There are theoretical advantages to both, but more importantly neither of them is going to change. So the Consumer of RDMA applications needs to use ULP/application layer flow control to pace the transmitter. At the application layer that means that the RECV must be posted *before* the Send/accept that grants ULP credits to the far side. All of that should be clear in the IOV ownership rules and discussion of the semantics of send/recv. If you thought you saw something that implied any guarantees to the contrary then could you point them out in a posting to the DAT reflector? (or just send them to me or Arkady Kanevsky).