Ralph, Gilles,

thanks for your input.

Before I answer, let me shortly explain what my general intention is.
We do have our own resource manager and process launcher that supports
different MPI implementations in different ways. I want to adapt it to
PMIx to cleanly support OpenMPI and hopefully other MPI implementation
supporting PMIx in the future, too. 

> It sounds like what you really want to do is replace the orted, and
> have your orted open your PMIx server? In other words, you want to
> use the PMIx reference library to handle all the PMIx stuff, and
> provide your own backend functions to support the PMIx server calls? 

You are right, that was my original plan, and I already did it so far.
In my environment I already can launch processes that successfully call
PMIx client functions like put, get, fence and so on, all handled by my
servers using the PMIx server helper library. As far as I implemented
the server functions now, all the example programs coming with the pmix
library are working fine.

Then I tried to use that with OpenMPI and stumbled.
My first idea was to simply replace orted but after taking a closer
look into OpenMPI it seems to me, that it uses/needs orted not only for
spawning and exchange of process information, but also for its general
communication and collectives. Am I wrong with that?

So replacing it completely is perhaps not what I want since I do not
intent to replace OpenMPIs whole communication stuff. But perhaps I do
mix up orte and orted here, not certain about that.

> If so, then your best bet would be to edit the PRRTE code in
> orte/orted/pmix and replace it with your code. You’ll have to deal
> with the ORTE data objects and PRRTE’s launch procedure, but that is
> likely easier than trying to write your own version of “orted” from
> scratch.

I think one problem here is, that I do not really understand which
purposes orted fulfills overall especially beside implementing the PMIx
server side. Can you please give me a short overview?

> As for Slurm: it behaves the same way as PRRTE. It has a plugin that
> implements the server backend functions, and the Slurm daemons “host”
> the plugin. What you would need to do is replace that plugin with
> your own.

I understand that, but it also seems to need some special support by
the several slurm modules on the OpenMPI side that I do not understand,
yet. At least when I tried OpenMPI without slurm support and
`srun --mpi=pmix_v2` it does not work but generates a message that
slurm support in opemmpi is missing.



Stephan



> 
> > On Oct 8, 2018, at 5:36 PM, Gilles Gouaillardet <gil...@rist.or.jp>
> > wrote:
> > 
> > Stephan,
> > 
> > 
> > Have you already checked https://github.com/pmix/prrte ?
> > 
> > 
> > This is the PMIx Reference RunTime Environment (PPRTE), which was
> > built on top of orted.
> > 
> > Long story short, it deploys the PMIx server and then you start
> > your MPI app with prun
> > An example is available at https://github.com/pmix/prrte/blob/maste
> > r/contrib/travis/test_client.sh
> > 
> > 
> > Cheers,
> > 
> > Gilles
> > 
> > 
> > On 10/9/2018 8:45 AM, Stephan Krempel wrote:
> > > Hallo everyone,
> > > 
> > > I am currently implementing a PMIx server and I try to use it
> > > with
> > > OpenMPI. I do have an own mpiexec which starts my PMIx server and
> > > launches the processes.
> > > 
> > > If I launch an executable linked against OpenMPI, during
> > > MPI_Init() the
> > > ORTE layer starts another PMIx server and overrides my PMIX_*
> > > environment so this new server is used instead of mine.
> > > 
> > > So I am looking for a method to prevent orte(d) from starting a
> > > PMIx
> > > server.
> > > 
> > > I already tried to understand what the slurm support is doing,
> > > since
> > > this is (at least in parts) what I think I need. Somehow when
> > > starting
> > > a job with srun --mpi=pmix_v2 the ess module pmi is started, but
> > > I was
> > > not able to enforce that manually by setting an MCA parameter
> > > (oss
> > > should be the correct one?!?)
> > > And I do not yet have a clue how the slurm support is working.
> > > 
> > > So does anyone has a hint for me where I can find documentation
> > > or
> > > information concerning that or is there an easy way to achieve
> > > what I
> > > am trying to do that I missed?
> > > 
> > > Thank you in advance.
> > > 
> > > Regards,
> > > 
> > > Stephan
> > > _______________________________________________
> > > devel mailing list
> > > devel@lists.open-mpi.org
> > > https://lists.open-mpi.org/mailman/listinfo/devel
> > > 
> > 
> > _______________________________________________
> > devel mailing list
> > devel@lists.open-mpi.org
> > https://lists.open-mpi.org/mailman/listinfo/devel
> 
> _______________________________________________
> devel mailing list
> devel@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/devel
_______________________________________________
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

Reply via email to