[OMPI devel] Unreliable Datagram BTL

Andrew Friedley Tue, 19 Jun 2007 19:04:20 -0400

Galen asked for a writeup of where the UD BTL support is at and what(important) issues remain, so here it is.

Right now, to ensure MPI guaranteed delivery semantics the DR PML mustbe used with UD -- the UD BTL does not implement its own reliability.The best solution would be to implement a lightweight reliabilityprotocol within the UD BTL, and would be most effective with a progressthread.

Progress threads are a whole other issue.. with a quick implementation,I was hitting all sorts of segfaults in the PML. The UD BTL seemsunique in that it is common for messages to be received and passed up tothe PML out of order. I can revisit this and file some bug reports ifdesired sooner than later.

I know of one outstanding bug -- any of the tests in the intel suiteusing buffered sends fail with incorrect data. I've shown this problemto George, Galen, and Brian and have yet to come up with a fix -- itappears to be an issue with messages arriving at the PML out of order,at which point the PML has no datatype information so cannot reassemblethe messages correctly. This would need to be fixed for 1.3.

When the UD BTL goes into the trunk, it will always de-select itselfunless specifically requested with the MCA btl parameter (i.e. -mca btlud,self). This prevents the UD BTL from being used by default alongwith the existing RC (openib) BTL and possibly lowering performance.

Some minor issues.. when it hits the trunk, it will be called 'ofud',short for OpenFabrics Unreliable Datagrams. Currently RDMA CM is notused, though it will not be hard to switch over (doing it at the sametime as the openib BTL seems appropriate to me).


Andrew

[OMPI devel] Unreliable Datagram BTL

Reply via email to