Re: [OMPI devel] collective problems

Patrick Geoffray Thu, 8 Nov 2007 15:12:11 -0500

Hi Gleb,

Gleb Natapov wrote:

In the case of TCP, kernel is kind enough to progress message for you,
but only if there was enough space in a kernel internal buffers. If there
was no place there, TCP BTL will also buffer messages in userspace and
will, eventually, have the same problem.

Occasionally buffering to hide flow-control issue is fine, assuming thatthere is a mechanism to flush the buffer (below). However, you cannotbuffer everything and it is just as fine to expose the back pressurewhen the buffer space is exhausted, to show the application that thereis a sustained problem. In this case, it is reasonable to block theapplication (ie the MPI request) while you cannot buffer the outgoing data.

The problem of the progression of already buffered outgoing data is thereal problem, not the buffering itself.

Here, the proposal is to allow the BTL to buffer, but requires the PMLto handle progress. That's broken, IMHO.

To progress such outstanding messages additional thread is needed in
userspace. Is this what MX does?

MX uses user-level thread but it's mainly for progressing thehigher-level protocol on the receive side. On the send side for thelow-level protocol, it is easier to ask your driver to either wake youup when the sending resource is available again (blocking on a CQ forIB) or take care of the sending itself.


<usual rant>

My overall problem with this proposal is a race to the bottom, based onthe lowest BTL, functionality-wise. The PML already imposes a pipeliningfor large messages (with a few knobs, but still) when most protocols inother BTLs already have their own. Now it's flow-control progression(not MPI progression).

Can each BTL implement what is needed for a particular back-end insteadof bloating the upper layer ?

</usual rant>

Patrick

Re: [OMPI devel] collective problems

Reply via email to