Steve Wise wrote:
On Tue, 2007-05-08 at 13:57 -0400, Andrew Friedley wrote:
Steve Wise wrote:
Well I've tried OMPI on ofed-1.2 udapl today and it doesn't work. I'm
debugging now.
Here's part of the problem (from ompi/btl/udapl/btl_udapl.c):
/* TODO - big bad evil hack! */
/* uDAPL doesn't ever seem to keep track of ports with addresses. This
becomes a problem when we use dat_ep_query() to obtain a remote address
on an endpoint. In this case, both the DAT_PORT_QUAL and the sin_port
field in the DAT_SOCK_ADDR are 0, regardless of the actual port. This is
a problem when we have more than one uDAPL process per IA - these
processes will have exactly the same address, as the port is all
we have to differentiate who is who. Thus, our uDAPL EP -> BTL EP
matching algorithm will break down.
So, we insert the port we used for our PSP into the DAT_SOCK_ADDR for
this IA. uDAPL then conveniently propagates this to where we need it.
*/
((struct sockaddr_in*)attr.ia_address_ptr)->sin_port = htons(port);
((struct sockaddr_in*)&btl->udapl_addr.addr)->sin_port = htons(port);
The OMPI code stuffs the port chosen by udapl for a listening endpoint
into the ia address memory (which is owned by the udapl layer btw).
There's a slight problem with that: The OFA udapl openib_cma code binds
cm_id's to this ia_address regularly. When an hca is opened, a cm_id is
bound to this address to obtain the local hca port number and gid that
is being used. In addition, a cm_id is bound to this address each time
an endpoint is created (either at ep_create time or ep_connect time).
So that ia_address field is used by the dapl cm to create local
cm_ids... Since the port was always zero, the rmda-cma would choose a
unique port for each cm_id bound to that address.
But OMPI sets a the port field to non-zero, the rdma_cma fails all the
subsequent rdma_bind_addr() calls since the port is already in use.
Perhaps this hack really is a workaround for a DAPL bug where somebodies
dapl wasn't tracking port numbers correctly?
Yep. My memory is dim, but I think that was OFED's DAPL, or it was in
the generic part of DAPL that all implementations seem to share.
As hinted by the comment (I wrote it by the way), I think the best
solution would be if dat_ep_query() returned the port number correctly.
Most of uDAPL seems to just pass around pointers to internal data
structures (which I'm not sure is the best idea in the world), so it
didn't seem like a trivial fix to me at the time. I remember
considering reporting this as a bug, but I didn't because the uDAPL
standard didn't seem to enforce any requirements on passing the port
number around with the address, so it technically wasn't wrong.
Was the OFED uDAPL code switched from something else to RDMA CM at some
point? I'm almost certain I was running fine on OFED's uDAPL at one
point (in fact, a lot of the uDAPL BTL development I did was using the
OFED stack).
Yes, the OFA uDAPL was changed from using the ib-cm to the rdma-cm a
while back. Perhaps you ran on the ib-cm version? And, the rdma-cma
started using port numbers and enforcing uniqueness even more recently I
think.
Perhaps Don Kerr has some insight on how the Sun uDAPL behaves? Should
OMPI still need this hack?
From what I recall, and Andrew can probably set me straight if I get
this wrong. This hack was included because we were not able to pull the
remote port from dat_ep_query. If dat_ep_query supplies that data then
we could probably do away with the hack.
I have not heard back from the developer at Sun who implemented uDAPL
for Solaris. My thought is that it was also based on the older ib-cm but
will confirm. I submitted a bug against Solaris uDAPL to provide the
port via dat_ep_query awhile back and it looks like it has been fixed, I
just have not tested this because we weren't using it.
-DON
Steve.