On Aug 13, 2007, at 11:28 AM, George Bosilca wrote:

Such a scheme is certainly possible, but I see even less use for it
than use cases for the existing microbenchmarks.  Specifically,
header caching *can* happen in real applications (i.e., repeatedly
send short messages with the same MPI signature), but repeatedly
sending to the same peer with exactly the same signature *and*
exactly the same "long-enough" data (i.e., more than a small number
of ints that an app could use for its own message data caching) is
indicative of a poorly-written MPI application IMHO.

If you look at the message size distribution for most of the HPC
applications (at least one that get investigated in the papers) you
will see that very small messages are only an non-significant
percentage of messages.

This would be different than what Patrick has told us about Myricom's analysis of real world MPI applications and one of the strong points of QLogic's HCAs (that it's all about short message latency / injection rate; bandwidth issues are [at least currently] secondary). :-)

As this "optimization" only address these
kind of messages, I doubt there is any real benefit from applications
point of view (obviously there will be few exceptions as usual). The
header caching only make sense for very small messages (MVAPICH only
implement header caching for messages up to 155 bytes [that's less
than 20 doubles] if I remember well), which make it a real benchmark
optimization.

I don't have enough data to say. But I'm sure there are at least *some* applications out there that would benefit from it. Probably somewhere between 1 and 99%. ;-)

But just to reiterate/be clear: my goal here is to reduce latency. If header caching is not the way to go, then so be it.

--
Jeff Squyres
Cisco Systems

Reply via email to