Hi folks

Per last week’s telecon, I committed the PR to bring PMIx into the master. As 
discussed, things are generally working okay - we had a little cleanup to do 
once the code was exposed to different environments, but not too horrible 
(thanks Gilles!).

First, a quick status update. We know that the MPI-2 dynamics are broken - this 
includes comm_spawn (will launch but not connect), connect/accept, and 
publish/lookup/unpublish. I am working on those now and hope to have them fully 
operational inn the next day or two. Everything else should be functional - if 
not, please report the bug.

There are a few warnings still being emitted for unused functions. Please 
ignore these for the moment as those functions will be used once we complete 
the integration.

Direct modex is working, but we are not yet making use of it. We still default 
to doing a full data exchange at startup. I’m not sure where we are relative to 
the async add_procs, but once that is ready we have the necessary support 
in-place.

You are certainly welcome to help fix issues with the PMIx code! We ask that 
any changes to the embedded PMIx code itself please be posted as PRs against 
the PMIx master - I will update the OMPI master from the PMIx tarball. This 
will help avoid losing your changes as we move forward.

https://github <https://github/>.com/open-mpi/pmix

So - what changed, you ask? Most of the change is transparent, but two things 
are not:

* the OMPI DPM framework has been eliminated and replaced with a core ompi/dpm 
directory. There is now only one way of doing dynamic process management, and 
that is thru the opal/mca/pmix framework, thus letting prior PMI 
implementations also support these functions (as much as they do)

* the OMPI PUB framework has been eliminated. The respective MPI bindings now 
directly call the opal/mca/pmix functions to implement publish, lookup, and 
unpublish


As a result of the changes, there isn’t much (if any) interaction between the 
MPI and ORTE layers any more - everything pretty much flows thru the OPAL/PMIx 
interface. Once the STCI folks have a chance to scratch their heads a bit, we 
may find that the OMPI/RTE framework can likewise disappear or be significantly 
reduced.


The transparent changes do not currently take advantage of the 
enhanced/extended PMIx functionality - we basically just did a direct 
replacement, with the addition of direct modex support. The “hooks” are exposed 
for OMPI to take advantage of things like notification - we just need to decide 
which ones we want and how/where to wire them into the code.

I’ll be updating the PMIx wiki over the next week or so to better explain the 
overall design. It is somewhat out-of-date in the details, though the broad 
design remains accurate.

HTH
Ralph

Reply via email to