Richard Graham wrote:
Re: [OMPI devel] RFC: sm Latency
On 1/20/09 2:08 PM, "Eugene Loh" <eugene....@sun.com> wrote:
Richard Graham wrote:
Re: [OMPI devel] RFC: sm Latency First, the
performance improvements look really nice.
A few questions:
- How much of an abstraction violation does this introduce?
Doesn't need to be much of an abstraction
violation at all if, by that, we mean teaching the BTL about the match
header. Just need to make some choices and I flagged that one for
better visibility.
>> I really don’t see how teaching the btl about matching will
help much (it will save a subroutine call). As I understand
>> the proposal you aim to selectively pull items out of the
fifo’s – this will break the fifo’s, as they assume contiguous
>> entries. Logic to manage holes will need to be added.
No. It's still a FIFO. You look at the tail of the FIFO. If you can
handle what you see there, you pop that item off and handle it. If you
can't, you punt and return control to the ULP, who handles things the
traditional (and heavier-weight) method. If the item of interest isn't
at the tail, you won't see it.
This looks like the btl needs to start
“knowing” about MPI level semantics.
That's one option. There are other options.
>> Such as ?
PML callback. Jeff's question about how much performance (if any) one
loses with callback is a good one. If I were less lazy (and had more
infinite time), I would have tested that before sending out the RFC.
As it was, I wanted to see how much pushback there would be on the
"abstract violation" issue. Enough, it turns out, to try the
experiment. I'll try to test it out and report back.
If you replace the fifo’s with a single link
list per process in shared memory, with senders to this process adding
match envelopes atomically, with each process reading its own link list
(multiple writers and single reader in non-threaded situation) there
will be only one place to poll, regardless of the number of procs
involved in the run.
*) Doesn't strike me as a "simple" change.
Let me be clear that I can see many benefits to this approach and don't
think it's prohibitively hard. So, I'm not trying to shoot this
approach down entirely. I do have the proposed approach implemented,
though, and it seems like a smaller change in behavior from what we
have today, and many of the optimizations are unrelated to polling (and
hence to the "single queue" proposal).
|