On May 9, 2007, at 1:37 AM, Or Gerlitz wrote:

Doing a bit of zoom out from the "how to make ofed's udapl work for ompi" thread, my thinking is that the ompi udapl btl enablement is actually only the first step, where for production/longterm/etc you want to have an rdmacm btl.

I think this is a bit of a misunderstanding. The "BTL" in Open MPI is a byte transfer layer; it is a point-to-point abstraction for moving bytes between two processes. BTL components (read: plugins) are typically distinguished by the underlying protocols used. For example, we have an RC verbs-based BTL and we have a separate uDAPL- based BTL. Andrew is also working on a research-quality UD verbs- based BTL.

Hence, how a particular BTL component makes connections between process peers is really a side-effect of moving bytes around, and not the focus of the BTL. So having a "rdmacm" BTL doesn't really make sense. If both the RC and UD verbs-based BTLs someday use the RDMA CM for connections, we might abstract the connection management out to a common piece of code between the two. But that's a different issue. If we end up having a mixed BTL someday that uses both RC and UD, then the need for the common code may go away. But that's in the future.

Reasoning here is made of many arguments, among them the quickest i can make are:

A) it seems that ompi would want to use not only RC but rather also UD multicast and unicast, which are not covered by udapl

B) there's actually no real justification to maintain two APIs (namely udapl vs libibvers/librdmacm), so down the road, only one of them would survive (udapl is implemented ***over*** libibverbs/ librdmacm so if the latteres dies same does udapl). Specifically, I hear here and there that the OFED stack is now on its way to be deployed all over the place, specifically in commercial Unix OSs (which want modern! code that supports IPoIB-CM,RDS,SRP,iSER, etc you named it) so eventually the rdmacm btl can be used also over Solaris et al.

I think that's not quite the point.

1. A piece of history: the uDAPL BTL was originally developed by a grad student just as an excuse to learn the BTL interface and OMPI internals. We already had an RC verbs-based BTL at the time.

2. When Sun joined Open MPI, they took over the development and maintenance of the uDAPL BTL because uDAPL is the only high performance stack on Solaris.

3. It's fine that Sun will someday support the same verbs interface that OFED does. But *today*, they don't. So for their current customers, they need to support uDAPL. As such, we have done little/ no testing of uDAPL on OFED since Sun took over the uDAPL BTL -- all testing since that point has been on Solaris uDAPL. All of our Linux/ OFED efforts have been on the verbs interface.

4. The Open MPI focus on uDAPL over OFED at the moment is simply to jump-start iWARP testing. Both NetEffect and Chelsio have chimed in to say that they will do the RDMA CM work for Open MPI, but uDAPL can be used as a temporary workaround that can be used [effectively] immediately while they get up to speed on the Open MPI code base and do the RDMA CM work.

--
Jeff Squyres
Cisco Systems

Reply via email to