On Feb 8, 2008, at 10:38 AM, Ralph Castain wrote:

I thought maybe we should move this to another thread as it really isn't
about Torsten's specific RFC.

I just took a quick gander at the code base to see how extensive this
problem might really be per Terry's concern. What I found was that we have added 3rd party code in several places. How we want to define them in terms
of this issue is probably something for discussion.

Packages I could readily identify include:

1. event library
4. backtrace
5. PLPA - this one is a little less obvious, but still being released as a
separate package

FWIW, these packages are part of "core" OMPI and are not especially problematic. We upgrade them when we have a need or desire to (which has been low frequency); we don't try to stay in sync with their release schedules at all.

2. ROMIO

ROMIO has traditionally been a problem (keeping up with its releases and patches). We have long-since agreed that we definitely want to include ROMIO in our tarball, even though that presents challenges. One thing that makes it *slightly* easier is that Brian added the mechanics for OMPI to use a ROMIO that is outside of Open MPI rather than the one that is bundled with it. It's not a perfect solution, but it does help some.

3. VT
6. libNBC

These two are definitely in the "contrib" category.

There may well be others - these are only the ones I know about. By 3rd
party package, I mean these are blocks of code obtained as a complete,
distinct version and "dropped in" to the OMPI code repository, and then to
some degree tied into our build system. They are not code specifically
developed for OMPI by OMPI developers.

Those are all that I'm aware of.

We have already discussed the issues with this approach. I am particularly
concerned with the maintenance and release cycle issues right now.

If these packages could be linked to our code instead of embedded within it,
then it seems to me that updating them could become much easier. For
example, we could download and install the latest ROMIO + Panasas patch, compile it, and simply link it into libompi - without occupying someone with
constantly fixing the build system issues, etc.

FWIW:

- event,backtrace,PLPA,ROMIO are included in OMPI because we wanted to certify them as part of "core" OMPI. That is, we wanted to certify the whole system (vs. relying on [untested] combinations of versions that already exist on users' systems).

- ROMIO is likely the only one of that group that presents ongoing logistics problems. The mechanism Brian added was seen as a workaround. Argonne will definitely need to be involved at some level to improve the ROMIO integration. Some talks started between Brian, me, and Rob(ANL) about a) making our integration better/easier, and b) having access to the ROMIO SVN to be able to suck down releases when we want to, but they kinda tapered off (Brian left and I got other priorities). There was also talk of LANL maintaining its own ROMIO tree and pushing it into OMPI, but I don't know what happened there. I can help with part of the ROMIO make-the-integration-easier (not in the immediate future, though -- probably not for a few weeks), but I do not think that I can do it on an ongoing basis. Note, too, that ROMIO is no longer distributed as a separate package -- it's only included in MPICH2. So it's a little harder to just link against a ROMIO that is already installed on a system -- there won't be one that isn't already bundled with an MPI.

- vt and libnbc are a different category; they are add-on functionality, not "core" OMPI.

Obviously, I don't claim to know enough about what was done to integrate ROMIO to know if this would easily work. I only use it to illustrate the
point - the same could be said about the event library, for example.

Given our maintenance support problems, it would seem to me that changing
the way we do 3rd party packaging may be worth consideration and some
effort. I can't prioritize that relative to 1.3, though I do note that, from LANL's perspective, the ROMIO issue is a definite blocker for 1.3 release.

Hmm. This is odd because of the prior statements about ROMIO from LANL (that LANL was going to maintain ROMIO and push it into OMPI). I'm assuming that's changed?

If ROMIO is a v1.3 blocker for LANL, can LANL commit resources to fixing the problem?

--
Jeff Squyres
Cisco Systems

Reply via email to