Eugene Loh wrote:
Actually, there may be a more important issue here.
Currently, the PML chooses the BTL first. Once the BTL choice is
established, only then does the PML choose between sendi and send.
Currently, it's also the case that we're spending a lot of time in the
PML doing a bunch of stuff that's totally unnecessary if the sendi
succeeds. So, we're neutralizing much of the advantage sendi is
supposed to provide.
So, I'm changing the PML to invoke sendi much sooner. The way I'm
doing this is to loop over BTLs, looking for a sendi that exists and
succeeds. If I find one, I'm done. If I don't, I have to go with the
standard send code path.
The logic, as I just described it, allows that multiple sendi
functions could fail and that the send that is ultimately used might
be for a different BTL than for any of the failing sendi's. This
would suggest that I do NOT want failing sendi's leaving any side
effects (like allocated descriptors).
Is my proposed logic bad? Should I implement things another way?
E.g., if I find a sendi function, use that BTL even if the sendi
failed and another BTL might have a sendi that could succeed? Or,
does my proposed change provide the justification for my pulling
descriptor allocations out of the sendi functions?
Here's another way of looking at it.
The current PML send code does this:
set_up_expensive_send_request(&sendreq);
for ( btl = ... ) {
if ( SUCCESS == sendi() ) return SUCCESS;
if ( SUCCESS == send(&sendreq) ) return SUCESS;
}
That is, we try one BTL after another. For each one, we try sendi
first. So, each sendi() that fails is immediately followed by a send()
of the same BTL. It's okay for a sendi() to do prep work for the send()
of the same BTL. This scheme does a bunch of expensive send-request
initialization that is unnecessary if the sendi(), which doesn't need
the send request, succeeds.
My proposed PML send logic is this:
for ( btl = ... ) {
if ( SUCCESS == sendi() ) return SUCCESS;
}
set_up_expensive_send_request(&sendreq);
for ( btl = ... ) {
if ( SUCCESS == send(&sendreq) ) return SUCCESS;
}
That is, if I can find a sendi() function, I use it. Only if I can't
find any sendi() do I set up the send request and call send() functions.
This is why I would like sendi() functions to have no side effects...
e.g., no allocated descriptors.