Re: [OMPI devel] Threaded progress for CPCs

Gleb Natapov Tue, 20 May 2008 06:02:27 -0400

On Mon, May 19, 2008 at 01:38:53PM -0400, Jeff Squyres wrote:
> >> 5. ...?
> > What about moving posting of receive buffers into main thread. With
> > SRQ it is easy: don't post anything in CPC thread. Main thread will
> > prepost buffers automatically after first fragment received on the
> > endpoint (in btl_openib_handle_incoming()). With PPRQ it's more
> > complicated. What if we'll prepost dummy buffers (not from free list)
> > during IBCM connection stage and will run another three way handshake
> > protocol using those buffers, but from the main thread. We will need  
> > to
> > prepost one buffer on the active side and two buffers on the passive  
> > side.
> 
> 
> This is probably the most viable alternative -- it would be easiest if  
> we did this for all CPC's, not just for IBCM:
> 
> - for PPRQ: CPCs only post a small number of receive buffers, suitable  
> for another handshake that will run in the upper-level openib BTL
> - for SRQ: CPCs don't post anything (because the SRQ already "belongs"  
> to the upper level openib BTL)
> 
> Do we have a BSRQ restriction that there *must* be at least one PPRQ?   
No. We don't have such restriction and I wouldn't want to add it.


> If so, we could always run the upper-level openib BTL really-post-the- 
> buffers handshake over the smallest buffer size BSRQ RC PPRQ (i.e.,  
> have the CPC post a single receive on this QP -- see below), which  
> would make things much easier.  If we don't already have this  
> restriction, would we mind adding it?  We have one PPRQ in our default  
> receive_queues value, anyway.
If there is not PPRQ then we can relay on RNR/retransmit logic in case
there is not enough buffer in SRQ. We do that anyway in openib BTL code.

> 
> With this rationale, once the CPC says "ok, all BSRQ QP's are  
> connected", then _endpoint.c can run a CTS handshake to post the  
> "real" buffers, where each side does the following:
> 
> - CPC calls _endpoint_connected() to tell the upper level BTL that it  
> is fully connected (the function is invoked in the main thread)
> - _endpoint_connected() posts all the "real" buffers to all the BSRQ  
> QP's on the endpoint
> - _endpoint_connected() then sends a CTS control message to remote  
> peer via smallest RC PPRQ
> - upon receipt of CTS:
>    - release the buffer (***)
>    - set endpoint state of CONNECTED and let all pending messages  
> flow... (as it happens today)
> 
> So it actually doesn't even have to be a handshake -- it's just an  
> additional CTS sent over the newly-created RC QP.  Since it's RC, we  
> don't have to do much -- just wait for the CTS to know that the remote  
> side has actually posted all the receives that we expect it to have.   
> Since the CTS flows over a PPRQ, there's no issue about receiving the  
> CTS on an SRQ (because the SRQ may not have any buffers posted at any  
> given time).
Correct. Full handshake is not needed. The trick is to allocate those
initial buffers in a smart way. IMO initial buffer should be very
small (a couple of bytes only) and be preallocated on endpoint creation.
This will solve locking problem.

--
                        Gleb.

Re: [OMPI devel] Threaded progress for CPCs

Reply via email to