Short version:
--------------

The modular wireup code on /tmp/jms-modular-wireup seems to be working. Can people give it a whirl before I bring it back to the trunk? The more esoteric your hardware setup, the better.

Longer version:
---------------

I think that I have completed round 1 of the modular wireup work in / tmp/jms-modular-wireup, meaning that all the wireup code has been moved out of btl_openib_endpoint.* and into connect/*. The endpoint.c file now simply calls the connect interface through a function pointer (allowing the choice of the current RML-based wireup or the RDMA CM). The selected connect "module" will call back to the openib endpoint for two things:

1. post receive buffers on a locally-created-but-not-yet-connected qp
2. when the qp is fully connected and ready to be used

This cleaned up the endpoint.* code a *lot*. I also simplified the RML connection code a bit -- I removed some useless sub-functions, etc.

I *think* that this new connection code is all working, but per http://www.open-mpi.org/community/lists/devel/2007/07/2058.php, I'm seeing other weird failures so I'm a little reluctant to put this back on the trunk until I know that everything is working properly. Granted, the failures in the other post sound like pml errors and this should be a wholly separate issue (we would get different warnings/errors if the btl failed to connect), but still -- it seems a little safer to be prudent.

Still to do:

- make the static rate be exchanged and set properly during the RML wireup
- RDMA CM support (it returns ERR_NOT_IMPLEMENTED right now)

--
Jeff Squyres
Cisco Systems

Reply via email to