On Tue, 3 Mar 2009, Jeff Squyres wrote:
On Mar 3, 2009, at 3:31 PM, Eugene Loh wrote:
First, this behavior is basically what I was proposing and what George
didn't feel comfortable with. It is arguably no compromise at all. (Uggh,
why must I be so honest?) For eager messages, it favors BTLs with sendi
functions, which could lead to those BTLs becoming overloaded. I think
favoring BTLs with sendi for short messages is good. George thinks that
load balancing BTLs is good.
Second, the implementation can be simpler than you suggest:
*) You don't need a separate list since testing for a sendi-enabled BTL is
relatively cheap (I think... could verify).
*) You don't need to shuffle the list. The mechanism used by ob1 just
resumes the BTL search from the last BTL used. E.g., check
https://svn.open-mpi.org/source/xref/ompi_1.3/ompi/mca/pml/ob1/pml_ob1_sendreq.h#mca_pml_ob1_send_request_start
. You use mca_bml_base_btl_array_get_next(&btl_eager) to roundrobin over
BTLs in a totally fair manner (remembering where the last loop left off),
and using mca_bml_base_btl_array_get_size(&btl_eager) to make sure you
don't loop endlessly.
Cool / fair enough.
How about an MCA parameter to switch between this mechanism (early sendi) and
the original behavior (late sendi)?
This is the usual way that we resolve "I want to do X / I want to do Y"
disputes. :-)
Of all the options presented, this is the one I dislike most :).
This is *THE* critical path of the OB1 PML. It's already horribly complex
and hard to follow (as Eugene is finding out the hard way). Making it
more complex as a way to settle this argument is pain and suffering just
to avoid conflict.
However, one possible option that just occurred to me. I propose yet
another option. If (AND ONLY IF) ob1/r2 detects that there are at least
two BTLs to the same peer at the same priority and at least one has a
sendi and at least one does not have a sendi, what about an MCA parameter
to disable all sendi functions to that peer?
There's only a 1% gain in the FAIR protocol Euegene proposed, so we'd lose
that 1% in the heterogeneous multi-nic case (the least common case).
There would be a much bigger gain for the sendi homogeneous multi-nic /
all single-nic cases (much more common), because the FAST protocol would
be used.
That way, we get the FAST protocol in all cases for sm, which is what I
really want ;).
Brian