Brad Penoff wrote:
Any objections to us committing an SCTP BTL to ompi-trunk if it has the ompi_ignore file in it first?
I'd like to see this in the trunk, though I'd guess that others will want to know how you plan to support/maintain this code long-term once it's in. I don't think an ompi_ignore is necessary either, as long as your configure checks are right.
Do you have any publications on this work?
For fault tolerance purposes, SCTP connections (termed "associations") can be made aware of multiple interfaces on the endpoints by binding to more than one interface (for performance, the CMT extension uses this multihoming feature to stripe data). SCTP also has several different APIs that it supports. Like TCP, there can be a one-to-one socket per connection. Another option is that like UDP, there can be a single one-to-many socket that is used for all connections. The SCTP BTL has the option of using either socket style, depending on the value of the btl_sctp_if_11 MCA option. When this value is 1, the one-to-one socket is used and like the TCP BTL, there are as many BTL component modules as the number of network cards specified with if_include and friends. By default, this value is 0 which means that a single one-to-many socket is used; here only one BTL module is used and internally, SCTP itself handles within that one socket all the network cards specified with if_include, etc.
Sounds like a good setup. Have you done performance/resource utilization/scaling comparisons of the two approaches, as well as how they compare to the TCP BTL?
Currently, both the one-to-one and the one-to-many make use of the event library offered by Open MPI. The callback functions for the one-to-many style however are quite unique as multiple endpoints may be interested in the events that poll returns. Currently we use these unique callback functions, but in the future the hope is to play with the potential benefits of a btl_progress function, particularly for the one-to-many style.
In my experience the event callbacks have a high overhead compared to a progress function, so I'd say thats definitely worth checking out.
Andrew