Re: [OMPI devel] OMPI over ofed udapl - bugs opened

Andrew Friedley Wed, 9 May 2007 16:16:46 -0400


Steve Wise wrote:

There have been a series of discussions on the ofa general list about
this issue, and the conclusion to date is that it cannot be resolved in
the rdma-cm or iwarp-cm code of the linux rdma stack.  Mainly because
sending an RDMA message involves the ULP's work queue and completion
queue, so the CM cannot do this under the covers in a mannor that
doesn't affect the application.  Thus, the applications must deal with
this.

Why can't uDAPL deal with this? As a uDAPL user, I really don't carewhat API uDAPL is using under the hood to move data from one place toanother, nor the quirks of that API. The whole point of uDAPL is toform a network-agnostic abstraction layer. AFAIK, the uDAPL specdoesn't enforce any such requirement on RDMA communication either. Inmy opinion, exposing such behavior above uDAPL is incorrect and is partof why uDAPL has seen limited adoption -- every single uDAPLimplementation behaves in different ways, making it extremely difficultto write an application to work on any uDAPL implementation. Sorry ifthis sounds harsh, but this comes from many hours of banging my head onthe wall due to working around these sorts of problems :)

Here is a possible solution:

I assume in OMPI that connections are only initiated when the mpi
application does a send operation.   Given that, then udapl btl must
ensure that if a given rank accepts a connection, it cannot not send
anything until the rank at the other end of the connection sends first.
Since the other side initiated the connection, it will have pending data
to send...

I haven't looked into how painful this will be to implement.

Thoughts?

Following on what I wrote above, I think Open MPI is the wrong place tobe dealing with this. There's enough of these hacks as it is; I'm notinterested in seeing more get added.


Andrew

Re: [OMPI devel] OMPI over ofed udapl - bugs opened

Reply via email to