I'm going to "re-integrate" Jeff and Brian's comments into one reponse.

I have no problem with either of their observations. I only included the
event library, backtrace, and PLPA in my list for completeness. I expected
we would continue to treat those as we are, recognizing that this means
-someone- is going to have to step up to support those when we need to
update them. In the event library case, I know people have talked about a
major change coming soon - a release that has significant improvement we may
care about. Not sure when that might happen, or who is going to do that
integration.

As to ROMIO: as with many of the community's "planned" contributions, they
have tended to fade with time and personnel turnover. At this time, there is
no way LANL could support a ROMIO integration without a significant delay to
the proposed 1.3 release schedule. Not that such a delay particularly
bothers me - I don't see a pressing need to just throw something out there,
and I have been beaten severely around the neck-and-shoulders the last two
days about how out of date our ROMIO version is, and that it lacks a
critical Panasas patch that is severely impacting performance.

I'll continue to talk to people here about possibly getting help with ROMIO.
I don't know the prospects, but it will take some time for someone to become
familiar enough with our code base/build system to make a real contribution.
Alternatively, -I- may have to take this on, which will definitely delay the
1.3 RTE work, effectively just transferring the "blocker" from one part of
the code to another. ;-)

But we can deal with that on a separate thread. For now, I think Jeff's last
response to the other thread is where we are converging: delay work on a 3rd
party contribution system until we have more cycles, but don't bring more
3rd party code (post-libNBC) in until we have a better mechanism.

Ralph


On 2/8/08 9:06 AM, "Jeff Squyres" <jsquy...@cisco.com> wrote:

> On Feb 8, 2008, at 10:38 AM, Ralph Castain wrote:
> 
>> I thought maybe we should move this to another thread as it really
>> isn't
>> about Torsten's specific RFC.
>> 
>> I just took a quick gander at the code base to see how extensive this
>> problem might really be per Terry's concern. What I found was that
>> we have
>> added 3rd party code in several places. How we want to define them
>> in terms
>> of this issue is probably something for discussion.
>> 
>> Packages I could readily identify include:
>> 
>> 1. event library
>> 4. backtrace
>> 5. PLPA - this one is a little less obvious, but still being
>> released as a
>> separate package
> 
> FWIW, these packages are part of "core" OMPI and are not especially
> problematic.  We upgrade them when we have a need or desire to (which
> has been low frequency); we don't try to stay in sync with their
> release schedules at all.
> 
>> 2. ROMIO
> 
> ROMIO has traditionally been a problem (keeping up with its releases
> and patches).  We have long-since agreed that we definitely want to
> include ROMIO in our tarball, even though that presents challenges.
> One thing that makes it *slightly* easier is that Brian added the
> mechanics for OMPI to use a ROMIO that is outside of Open MPI rather
> than the one that is bundled with it.  It's not a perfect solution,
> but it does help some.
> 
>> 3. VT
>> 6. libNBC
> 
> These two are definitely in the "contrib" category.
> 
>> There may well be others - these are only the ones I know about. By
>> 3rd
>> party package, I mean these are blocks of code obtained as a complete,
>> distinct version and "dropped in" to the OMPI code repository, and
>> then to
>> some degree tied into our build system. They are not code specifically
>> developed for OMPI by OMPI developers.
> 
> Those are all that I'm aware of.
> 
>> We have already discussed the issues with this approach. I am
>> particularly
>> concerned with the maintenance and release cycle issues right now.
>> 
>> If these packages could be linked to our code instead of embedded
>> within it,
>> then it seems to me that updating them could become much easier. For
>> example, we could download and install the latest ROMIO + Panasas
>> patch,
>> compile it, and simply link it into libompi - without occupying
>> someone with
>> constantly fixing the build system issues, etc.
> 
> FWIW:
> 
> - event,backtrace,PLPA,ROMIO are included in OMPI because we wanted to
> certify them as part of "core" OMPI.  That is, we wanted to certify
> the whole system (vs. relying on [untested] combinations of versions
> that already exist on users' systems).
> 
> - ROMIO is likely the only one of that group that presents ongoing
> logistics problems.  The mechanism Brian added was seen as a
> workaround.  Argonne will definitely need to be involved at some level
> to improve the ROMIO integration.  Some talks started between Brian,
> me, and Rob(ANL) about a) making our integration better/easier, and b)
> having access to the ROMIO SVN to be able to suck down releases when
> we want to, but they kinda tapered off (Brian left and I got other
> priorities).  There was also talk of LANL maintaining its own ROMIO
> tree and pushing it into OMPI, but I don't know what happened there.
> I can help with part of the ROMIO make-the-integration-easier (not in
> the immediate future, though -- probably not for a few weeks), but I
> do not think that I can do it on an ongoing basis.  Note, too, that
> ROMIO is no longer distributed as a separate package -- it's only
> included in MPICH2.  So it's a little harder to just link against a
> ROMIO that is already installed on a system -- there won't be one that
> isn't already bundled with an MPI.
> 
> - vt and libnbc are a different category; they are add-on
> functionality, not "core" OMPI.
> 
>> Obviously, I don't claim to know enough about what was done to
>> integrate
>> ROMIO to know if this would easily work. I only use it to illustrate
>> the
>> point - the same could be said about the event library, for example.
>> 
>> Given our maintenance support problems, it would seem to me that
>> changing
>> the way we do 3rd party packaging may be worth consideration and some
>> effort. I can't prioritize that relative to 1.3, though I do note
>> that, from
>> LANL's perspective, the ROMIO issue is a definite blocker for 1.3
>> release.
> 
> Hmm.  This is odd because of the prior statements about ROMIO from
> LANL (that LANL was going to maintain ROMIO and push it into OMPI).
> I'm assuming that's changed?
> 
> If ROMIO is a v1.3 blocker for LANL, can LANL commit resources to
> fixing the problem?


Reply via email to