Re: [OMPI devel] RFC: sm Latency

Brian Barrett Tue, 20 Jan 2009 20:57:42 -0500

I unfortunately don't have time to look in depth at the patch. But myconcern is that currently (today, not at some made up time in thefuture, maybe), we use the BTLs for more than just MPI point-to-point. The rdma one-sided component (which was added for 1.3 andhopefully will be the default for 1.4) sends messages directly overthe btls. It would be interesting to know how that is handled.


Brian



On Jan 20, 2009, at 6:53 PM, Jeff Squyres wrote:

This all sounds really great to me. I agree with most of what hasbeen said -- e.g., benchmarks *are* important. Improving them caneven sometimes have the side effect of improving realapplications. ;-)
My one big concern is the moving of architectural boundaries ofmaking the btl understand MPI match headers. But even there, I'mtorn:
1. I understand why it is better -- performance-wise -- to do this.And the performance improvement results are hard to argue with. Wetook a similar approach with ORTE; ORTE is now OMPI-specific, andmany, many things have become better (from the OMPI perspective, atleast).
2. We all have the knee-jerk reaction that we don't want to have theBTLs know anything about MPI semantics because they've always beenthat way and it has been a useful abstraction barrier. Now there'seven a project afoot to move the BTLs out into a separate later thatcannot know about MPI (so that other things can be built upon it).But are we sacrificing potential MPI performance here? I thinkthat's one important question.
Eugene: you mentioned that there are other possibilities to havingthe BTL understand match headers, such as a callback into the PML.Have you tried this approach to see what the performance cost wouldbe, perchance?
I'd like to see George's reaction to this RFC, and Brian's (if hehas time).
On Jan 20, 2009, at 8:04 PM, Eugene Loh wrote:
Patrick Geoffray wrote:
Eugene Loh wrote:
replace the fifo’s with a single link list per process in shared
memory, with senders to this process adding match envelopes
atomically, with each process reading its own link list (multiple
*) Doesn't strike me as a "simple" change.
Actually, it's much simpler than trying to optimize/scale the N^2
implementation, IMHO.
1) The version I talk about is already done. Check my putbacks."Already
done" is easier! :^)
2) The two ideas are largely orthogonal. The RFC talks about avariety
of things: cleaning up the sendi function, moving the sendi call up
higher in the PML, bypassing the PML receive-request structure(similarto sendi), and stream-lining the data convertors in common cases.Only
one part of the RFC (directed polling) overlaps with having a single
FIFO per receiver.
*) Not sure this addresses all-to-all well. E.g., let's say youpost areceive for a particular source. Do you then wade through a longFIFO
to look for your match?
The tradeoff is between demultiplexing by the sender, which costin time
and in space, or by the receiver, which cost an atomic inc. ANY_TAG
forces you to demultiplex on the receive side anyway. Regarding
all-to-all, it won't be more expensive if the receives are pre-posted,
and they should be.
Not sure I understand this paragraph. I do, however, think there are
great benefits to the single-receiver-queue model. It impliescongestionon the receiver side in the many-to-one case, but if a singlereceiver
is reading all those messages anyhow, message-processing is already
going to throttle the message rate. The extra "bottleneck" at theFIFO
might never be seen.
What the RFC talks about is not the last SM development we'll ever
need.  It's only supposed to be one step forward from where we are
today. The "single queue per receiver" approach has manyadvantages,
but I think it's a different topic.
But is this intermediate step worth it or should we (well,you :-) ) go
directly for the single queue model ?
To recap:
1) The work is already done.
2) The single-queue model addresses only one of the RFC's issues.
3) I'm a fan of the single-queue model, but it's just a separatediscussion.
_______________________________________________
devel mailing list
[email protected]
http://www.open-mpi.org/mailman/listinfo.cgi/devel
--
Jeff Squyres
Cisco Systems


_______________________________________________
devel mailing list
[email protected]
http://www.open-mpi.org/mailman/listinfo.cgi/devel

Re: [OMPI devel] RFC: sm Latency

Reply via email to