Hi Ralph, Thanks for getting this in!
I verified that for master/HEAD today that, modulo the caveats about spawn/pub/sub etc. job launches on Cray using aprun or srun work as expected, so some of the MTT failures over the weekend should go away with runs this week. Thanks, Howard 2015-08-31 9:59 GMT-06:00 Ralph Castain <r...@open-mpi.org>: > Hi folks > > Per last week’s telecon, I committed the PR to bring PMIx into the master. > As discussed, things are generally working okay - we had a little cleanup > to do once the code was exposed to different environments, but not too > horrible (thanks Gilles!). > > First, a quick status update. We know that the MPI-2 dynamics are broken - > this includes comm_spawn (will launch but not connect), connect/accept, and > publish/lookup/unpublish. I am working on those now and hope to have them > fully operational inn the next day or two. Everything else should be > functional - if not, please report the bug. > > There are a few warnings still being emitted for unused functions. Please > ignore these for the moment as those functions will be used once we > complete the integration. > > Direct modex is working, but we are not yet making use of it. We still > default to doing a full data exchange at startup. I’m not sure where we are > relative to the async add_procs, but once that is ready we have the > necessary support in-place. > > You are certainly welcome to help fix issues with the PMIx code! We ask > that any changes to the embedded PMIx code itself please be posted as PRs > against the PMIx master - I will update the OMPI master from the PMIx > tarball. This will help avoid losing your changes as we move forward. > > https://github.com/open-mpi/pmix > > So - what changed, you ask? Most of the change is transparent, but two > things are not: > > * the OMPI DPM framework has been eliminated and replaced with a core > ompi/dpm directory. There is now only one way of doing dynamic process > management, and that is thru the opal/mca/pmix framework, thus letting > prior PMI implementations also support these functions (as much as they do) > > * the OMPI PUB framework has been eliminated. The respective MPI bindings > now directly call the opal/mca/pmix functions to implement publish, lookup, > and unpublish > > > As a result of the changes, there isn’t much (if any) interaction between > the MPI and ORTE layers any more - everything pretty much flows thru the > OPAL/PMIx interface. Once the STCI folks have a chance to scratch their > heads a bit, we may find that the OMPI/RTE framework can likewise disappear > or be significantly reduced. > > > The transparent changes do not currently take advantage of the > enhanced/extended PMIx functionality - we basically just did a direct > replacement, with the addition of direct modex support. The “hooks” are > exposed for OMPI to take advantage of things like notification - we just > need to decide which ones we want and how/where to wire them into the code. > > I’ll be updating the PMIx wiki over the next week or so to better explain > the overall design. It is somewhat out-of-date in the details, though the > broad design remains accurate. > > HTH > Ralph > > > _______________________________________________ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2015/08/17902.php >