Steve Wise wrote:

On Tue, 2007-05-08 at 13:57 -0400, Andrew Friedley wrote:
Steve Wise wrote:
Well I've tried OMPI on ofed-1.2 udapl today and it doesn't work.  I'm
debugging now.

Here's part of the problem (from ompi/btl/udapl/btl_udapl.c):

   /* TODO - big bad evil hack! */
   /* uDAPL doesn't ever seem to keep track of ports with addresses.  This
      becomes a problem when we use dat_ep_query() to obtain a remote address
      on an endpoint.  In this case, both the DAT_PORT_QUAL and the sin_port
      field in the DAT_SOCK_ADDR are 0, regardless of the actual port. This is
      a problem when we have more than one uDAPL process per IA - these
      processes will have exactly the same address, as the port is all
      we have to differentiate who is who.  Thus, our uDAPL EP -> BTL EP
      matching algorithm will break down.

      So, we insert the port we used for our PSP into the DAT_SOCK_ADDR for
      this IA.  uDAPL then conveniently propagates this to where we need it.
    */
   ((struct sockaddr_in*)attr.ia_address_ptr)->sin_port = htons(port);
   ((struct sockaddr_in*)&btl->udapl_addr.addr)->sin_port = htons(port);

The OMPI code stuffs the port chosen by udapl for a listening endpoint
into the ia address memory (which is owned by the udapl layer btw).
There's a slight problem with that:  The OFA udapl openib_cma code binds
cm_id's to this ia_address regularly.  When an hca is opened, a cm_id is
bound to this address to obtain the local hca port number and gid that
is being used.  In addition, a cm_id is bound to this address each time
an endpoint is created (either at ep_create time or ep_connect time).
So that ia_address field is used by the dapl cm to create local
cm_ids...  Since the port was always zero, the rmda-cma would choose a
unique port for each cm_id bound to that address.
But OMPI sets a the port field to non-zero, the rdma_cma fails all the
subsequent rdma_bind_addr() calls since the port is already in use.

Perhaps this hack really is a workaround for a DAPL bug where somebodies
dapl wasn't tracking port numbers correctly?
Yep. My memory is dim, but I think that was OFED's DAPL, or it was in the generic part of DAPL that all implementations seem to share.

As hinted by the comment (I wrote it by the way), I think the best solution would be if dat_ep_query() returned the port number correctly. Most of uDAPL seems to just pass around pointers to internal data structures (which I'm not sure is the best idea in the world), so it didn't seem like a trivial fix to me at the time. I remember considering reporting this as a bug, but I didn't because the uDAPL standard didn't seem to enforce any requirements on passing the port number around with the address, so it technically wasn't wrong.

Was the OFED uDAPL code switched from something else to RDMA CM at some point? I'm almost certain I was running fine on OFED's uDAPL at one point (in fact, a lot of the uDAPL BTL development I did was using the OFED stack).

Yes, the OFA uDAPL was changed from using the ib-cm to the rdma-cm a
while back.  Perhaps you ran on the ib-cm version?  And, the rdma-cma
started using port numbers and enforcing uniqueness even more recently I
think.

Perhaps Don Kerr has some insight on how the Sun uDAPL behaves?  Should
OMPI still need this hack?
From what I recall, and Andrew can probably set me straight if I get this wrong. This hack was included because we were not able to pull the remote port from dat_ep_query. If dat_ep_query supplies that data then we could probably do away with the hack.

I have not heard back from the developer at Sun who implemented uDAPL for Solaris. My thought is that it was also based on the older ib-cm but will confirm. I submitted a bug against Solaris uDAPL to provide the port via dat_ep_query awhile back and it looks like it has been fixed, I just have not tested this because we weren't using it.

-DON


Steve.

Reply via email to