On Fri, 8 Feb 2008, Ralph Castain wrote:

1. event library
2. ROMIO
3. VT
4. backtrace
5. PLPA - this one is a little less obvious, but still being released as a
separate package
6. libNBC

Sorry to Ralph, but I clipped everything from his e-mail, then am going to make references to it. oh well :).

One minor correction -- the entire backtrace framework is not a third party deal. The *DARWIN/Mac OS X* component relies heavily on third party code, but the others (Linux and Solaris) are just wrappers around code in their respective C libraries.

I believe I was responsible for the event library, ROMIO, and backtrace before leaving LANL. I'll go through the motivations and issues with all three in terms of integration.

Event Library: The event library is the core "rendezvous" point for all of Open MPI, so any issues with it cause lots of issues with Open MPI in general. We've also hacked it considerably since taking the original libevent source -- we've renamed all the functions, we've made it thread safe in a way the author was unwilling to do, we've fixed some performance issues unique to our usage model. In short, this is no longer really the same libevent that might already be installed on the system. Using such an unmodified libevent would be disasterous.

ROMIO is actually one that there was significant discussion about prior to me leaveing Los Alamos. There are a number of problems / issues with ROMIO. First and foremost, without ROMIO, we are not a fully compliant MPI implementation. So we have to ship ROMIO -- it's the only way to have that important check mark. But its current integration has some issues -- it's hard to test patches independently. There is actually a mode in the current Open MPI tree where the MPI interface to MPI-I/O is not provided by OPen MPI and no io components are built. This is to allow users to build ROMIO independently of Open MPI, for testing updates or whatever. There are some disadvantages to this. First, the independent ROMIO will use generalized requests instead of being hooked into our progress engine, so there may be some progress issues (I never verified either way). Second, it does mean dealing with another package to build on the user's site. Jeff is correct --there was discussion about how to make the integration "better" -- many of the changes were on our side, and we were going to have to ask for a couple of changes from Argonne. If someone is going to put in the considerable amount of time to make this happen, I'm happy to write up whatever notes I can remember / find on the issue.

The Darwin backtrace component is mostly maintanance free. It doesn't support 64-bit Intel chips, but that's fine. Once every 18 months or so, I need to get a new copy for the latest operation system, although the truth is I don't think anything bad happens if we just stop doing the updates at OS release (by the way, I did the one for Leopard, so we're probably all going to be sick of MPI and on to other things before the next time it has to be done). While it's useful, if the community is really worried, it could probably be deleted. But having a stack trace when you segfault sure is nice :).

Brian


Reply via email to