Yo all

There has been a bit of discussion about this on the core developers list
and on telecons, but I felt that perhaps I should provide a more detailed
warning to the broader developer community.

In the next few weeks, there will be some major revisions submitted to the
Open MPI trunk on the OpenRTE (ORTE) side of the code base. These will
primarily address three known issues:

1. Scalability - the test code that was run on Sandia's Thunderbird cluster
a few weeks ago utilized a stage gate and trigger to help speed up launch of
the OpenRTE daemons on backend nodes. In addition, some code cleanup
occurred in the TM launcher. These improvements yielded a positive result,
and they will be brought over to the trunk with these changes.

2. MAD (MPI_Abort Disease) - we have encountered a problem whereby daemons
are left "spinning" wildly when MPI processes call MPI_Abort. This is
symptomatic of a circular logic loop that has crept into the abort handling
section of the OpenRTE code base. These changes will resolve that problem.

3. Daemon timeout on start - currently, we will wait forever for all daemons
to start because we have no way to detect that they failed in some
environments. We are adding a timeout mechanism (adjustable via MCA param,
of course) that will allow orte/mpirun to give up after some period of time.

As part of these revisions, I am working to bring the code base another step
closer to OpenRTE 2.0 compatibility. As a result, some of the changes may
appear unnecessary in terms of fixing the three issues noted above. I
apologize in advance for that, but beg your indulgence as these changes will
make eventual integration with 2.0 a little easier.

The upcoming revisions will involve changes to the RDS, RAS, PLS, ERRMGR,
RMGR, and SMR frameworks in the form of API changes. Most of these changes
are not massive, but impact a number of places in the code. However,
significant change will occur in several places:

1. the ERRMGR will see significant change in actual behavior as we clarify
its role. New components to differentiate behavior between head node
processes (HNPs), daemons (our orteds), and application processes are being
created.

2. communications to the OpenRTE daemons (orted's) will no longer take place
via individual frameworks but will be concentrated through the existing
orted non-blocking receive function. This will help us break the circular
logic loop and (hopefully) avoid re-creating it in the future.

3. the PLS "fork" component really was the orted's private launcher for
local processes. It has been moved to the orted's directory and renamed to
indicate that fact. Although there were good reasons to do this before, it
could not previously be done due to the built-in calls to the PLS - however,
with the new clarification of roles, this can now be cleanly done.

4. ALL resource management functionality has been constrained to the HNP.
Non-HNP processes (including orteds and application processes) solely
communicate their requests back to the HNP for execution. In addition, in
accordance with the OpenRTE 2.0 design, all resource management frameworks
(i.e., RDS, RAS, RMAPS, and PLS) are now publicly available (i.e., not just
through the RMGR).

5. the RMAPS framework has been changed to support multiple concurrent
mapping components, and a parameter added to the "map" API so the caller can
specify which one should be used for this specific map command.

For those of you with components in the affected frameworks, I am going
through them and making changes to keep them compatible with the revisions.
Again, these aren't major, but will require some checking to ensure they are
correct, especially for those components that will not compile unless in a
specific environment.

I hope to complete this work in the next 2-3 weeks. The work is taking place
on the tmp/mad branch - those of you with access to it are welcome to keep
track of what I am doing.

Prior to committing this massive a change to the trunk, I will be performing
testing on various platforms. I will also be contacting key people with
access to platforms beyond my domain to ask their help in testing the branch
in those environments.

And yes - I WILL send out a note alerting people to the upcoming commit
prior to throwing it into the trunk!

Feel free to contact me with any comments, suggestions, or concerns.
Ralph


Reply via email to