Re: [OMPI devel] collective problems

Jeff Squyres Wed, 7 Nov 2007 22:27:07 -0500

On Nov 7, 2007, at 9:33 PM, Patrick Geoffray wrote:

Remember that this is all in the context of Galen's proposal for
btl_send() to be able to return NOT_ON_WIRE -- meaning that the send
was successful, but it has not yet been sent (e.g., openib BTL
buffered it because it ran out of credits).

Sorry if I miss something obvious, but why does the PML has to beaware

of the flow control situation of the BTL ? If the BTL cannot send

something right away for any reason, it should be the responsibilityof

the BTL to buffer it and to progress on it later.

That's currently the way it is. But the BTL currently only has theoption to say two things:


1. "ok, done!" -- then the PML will think that the request is complete

2. "doh -- error!" -- then the PML thinks that Something BadHappened(tm)


What we really need is for the BTL to have a third option:

3. "not done yet!"

So that the PML knows that the request is not yet done, but will allowother things to progress while we're waiting for it to complete.Without this, the openib BTL currently replies "ok, done!", even whenit has only buffered a message (rather than actually sending it out).This optimization works great (yeah, I know...) except for apps thatdon't dip into the MPI library frequently. :-\


--
Jeff Squyres
Cisco Systems

Re: [OMPI devel] collective problems

Reply via email to