Steve Wise wrote:
Well I've tried OMPI on ofed-1.2 udapl today and it doesn't work.  I'm
debugging now.


Here's part of the problem (from ompi/btl/udapl/btl_udapl.c):

    /* TODO - big bad evil hack! */
    /* uDAPL doesn't ever seem to keep track of ports with addresses.  This
       becomes a problem when we use dat_ep_query() to obtain a remote address
       on an endpoint.  In this case, both the DAT_PORT_QUAL and the sin_port
       field in the DAT_SOCK_ADDR are 0, regardless of the actual port. This is
       a problem when we have more than one uDAPL process per IA - these
       processes will have exactly the same address, as the port is all
       we have to differentiate who is who.  Thus, our uDAPL EP -> BTL EP
       matching algorithm will break down.

       So, we insert the port we used for our PSP into the DAT_SOCK_ADDR for
       this IA.  uDAPL then conveniently propagates this to where we need it.
     */
    ((struct sockaddr_in*)attr.ia_address_ptr)->sin_port = htons(port);
    ((struct sockaddr_in*)&btl->udapl_addr.addr)->sin_port = htons(port);

The OMPI code stuffs the port chosen by udapl for a listening endpoint
into the ia address memory (which is owned by the udapl layer btw).
There's a slight problem with that:  The OFA udapl openib_cma code binds
cm_id's to this ia_address regularly.  When an hca is opened, a cm_id is
bound to this address to obtain the local hca port number and gid that
is being used.  In addition, a cm_id is bound to this address each time
an endpoint is created (either at ep_create time or ep_connect time).
So that ia_address field is used by the dapl cm to create local
cm_ids...  Since the port was always zero, the rmda-cma would choose a
unique port for each cm_id bound to that address.
But OMPI sets a the port field to non-zero, the rdma_cma fails all the
subsequent rdma_bind_addr() calls since the port is already in use.

Perhaps this hack really is a workaround for a DAPL bug where somebodies
dapl wasn't tracking port numbers correctly?

Yep. My memory is dim, but I think that was OFED's DAPL, or it was in the generic part of DAPL that all implementations seem to share.

As hinted by the comment (I wrote it by the way), I think the best solution would be if dat_ep_query() returned the port number correctly. Most of uDAPL seems to just pass around pointers to internal data structures (which I'm not sure is the best idea in the world), so it didn't seem like a trivial fix to me at the time. I remember considering reporting this as a bug, but I didn't because the uDAPL standard didn't seem to enforce any requirements on passing the port number around with the address, so it technically wasn't wrong.

Was the OFED uDAPL code switched from something else to RDMA CM at some point? I'm almost certain I was running fine on OFED's uDAPL at one point (in fact, a lot of the uDAPL BTL development I did was using the OFED stack).


I'm going to run a few experiments:

1) remove the OMPI hack and see if things work fine for OFA udapl.
Perhaps OFA udapl correctly tracks ports on endpoints?

Doubt it, but worth trying.

2) leave OMPI as-is and change OFA udapl to not assume the ia_addr
sockaddr has a 0 port in it.

Pretty sure this will work, don't know if it's the correct solution though.

Andrew

Reply via email to