On Mon, May 19, 2008 at 01:38:53PM -0400, Jeff Squyres wrote: > >> 5. ...? > > What about moving posting of receive buffers into main thread. With > > SRQ it is easy: don't post anything in CPC thread. Main thread will > > prepost buffers automatically after first fragment received on the > > endpoint (in btl_openib_handle_incoming()). With PPRQ it's more > > complicated. What if we'll prepost dummy buffers (not from free list) > > during IBCM connection stage and will run another three way handshake > > protocol using those buffers, but from the main thread. We will need > > to > > prepost one buffer on the active side and two buffers on the passive > > side. > > > This is probably the most viable alternative -- it would be easiest if > we did this for all CPC's, not just for IBCM: > > - for PPRQ: CPCs only post a small number of receive buffers, suitable > for another handshake that will run in the upper-level openib BTL > - for SRQ: CPCs don't post anything (because the SRQ already "belongs" > to the upper level openib BTL) > > Do we have a BSRQ restriction that there *must* be at least one PPRQ? No. We don't have such restriction and I wouldn't want to add it.
> If so, we could always run the upper-level openib BTL really-post-the- > buffers handshake over the smallest buffer size BSRQ RC PPRQ (i.e., > have the CPC post a single receive on this QP -- see below), which > would make things much easier. If we don't already have this > restriction, would we mind adding it? We have one PPRQ in our default > receive_queues value, anyway. If there is not PPRQ then we can relay on RNR/retransmit logic in case there is not enough buffer in SRQ. We do that anyway in openib BTL code. > > With this rationale, once the CPC says "ok, all BSRQ QP's are > connected", then _endpoint.c can run a CTS handshake to post the > "real" buffers, where each side does the following: > > - CPC calls _endpoint_connected() to tell the upper level BTL that it > is fully connected (the function is invoked in the main thread) > - _endpoint_connected() posts all the "real" buffers to all the BSRQ > QP's on the endpoint > - _endpoint_connected() then sends a CTS control message to remote > peer via smallest RC PPRQ > - upon receipt of CTS: > - release the buffer (***) > - set endpoint state of CONNECTED and let all pending messages > flow... (as it happens today) > > So it actually doesn't even have to be a handshake -- it's just an > additional CTS sent over the newly-created RC QP. Since it's RC, we > don't have to do much -- just wait for the CTS to know that the remote > side has actually posted all the receives that we expect it to have. > Since the CTS flows over a PPRQ, there's no issue about receiving the > CTS on an SRQ (because the SRQ may not have any buffers posted at any > given time). Correct. Full handshake is not needed. The trick is to allocate those initial buffers in a smart way. IMO initial buffer should be very small (a couple of bytes only) and be preallocated on endpoint creation. This will solve locking problem. -- Gleb.