Hi folks Per last week’s telecon, I committed the PR to bring PMIx into the master. As discussed, things are generally working okay - we had a little cleanup to do once the code was exposed to different environments, but not too horrible (thanks Gilles!).
First, a quick status update. We know that the MPI-2 dynamics are broken - this includes comm_spawn (will launch but not connect), connect/accept, and publish/lookup/unpublish. I am working on those now and hope to have them fully operational inn the next day or two. Everything else should be functional - if not, please report the bug. There are a few warnings still being emitted for unused functions. Please ignore these for the moment as those functions will be used once we complete the integration. Direct modex is working, but we are not yet making use of it. We still default to doing a full data exchange at startup. I’m not sure where we are relative to the async add_procs, but once that is ready we have the necessary support in-place. You are certainly welcome to help fix issues with the PMIx code! We ask that any changes to the embedded PMIx code itself please be posted as PRs against the PMIx master - I will update the OMPI master from the PMIx tarball. This will help avoid losing your changes as we move forward. https://github <https://github/>.com/open-mpi/pmix So - what changed, you ask? Most of the change is transparent, but two things are not: * the OMPI DPM framework has been eliminated and replaced with a core ompi/dpm directory. There is now only one way of doing dynamic process management, and that is thru the opal/mca/pmix framework, thus letting prior PMI implementations also support these functions (as much as they do) * the OMPI PUB framework has been eliminated. The respective MPI bindings now directly call the opal/mca/pmix functions to implement publish, lookup, and unpublish As a result of the changes, there isn’t much (if any) interaction between the MPI and ORTE layers any more - everything pretty much flows thru the OPAL/PMIx interface. Once the STCI folks have a chance to scratch their heads a bit, we may find that the OMPI/RTE framework can likewise disappear or be significantly reduced. The transparent changes do not currently take advantage of the enhanced/extended PMIx functionality - we basically just did a direct replacement, with the addition of direct modex support. The “hooks” are exposed for OMPI to take advantage of things like notification - we just need to decide which ones we want and how/where to wire them into the code. I’ll be updating the PMIx wiki over the next week or so to better explain the overall design. It is somewhat out-of-date in the details, though the broad design remains accurate. HTH Ralph