One suggestion: this approach requires that the job be executed using “mpirun”. Another approach would be to integrate PMIx into Kubernetes, thus allowing any job to call MPI_Init regardless of how it was started. The advantage would be that it enables the use of MPI by workflow-based applications that really aren’t supported by mpirun and require their own application manager.
See https://pmix.org <https://pmix.org/> for more info Ralph > On May 24, 2018, at 9:02 PM, Rong Ou <rong...@gmail.com> wrote: > > Hi guys, > > Thanks for all the suggestions! It's been a while but we finally got it > approved for open sourcing. I've submitted a proposal to kubeflow: > https://github.com/kubeflow/community/blob/master/proposals/mpi-operator-proposal.md > > <https://github.com/kubeflow/community/blob/master/proposals/mpi-operator-proposal.md>. > In this version we've managed to not use ssh, relying on `kubectl exec` > instead. It's still pretty "ghetto", but at least we've managed to train some > tensorflow models with it. :) Please take a look and let me know what you > think. > > Thanks, > > Rong > > On Fri, Mar 16, 2018 at 11:38 AM r...@open-mpi.org <mailto:r...@open-mpi.org> > <r...@open-mpi.org <mailto:r...@open-mpi.org>> wrote: > I haven’t really spent any time with Kubernetes, but it seems to me you could > just write a Kubernetes plm (and maybe an odls) component and bypass the ssh > stuff completely given that you say there is a launcher API. > > > On Mar 16, 2018, at 11:02 AM, Jeff Squyres (jsquyres) <jsquy...@cisco.com > > <mailto:jsquy...@cisco.com>> wrote: > > > > On Mar 16, 2018, at 10:01 AM, Gilles Gouaillardet > > <gilles.gouaillar...@gmail.com <mailto:gilles.gouaillar...@gmail.com>> > > wrote: > >> > >> By default, Open MPI uses the rsh PLM in order to start a job. > > > > To clarify one thing here: the name of our plugin is "rsh" for historical > > reasons, but it defaults to looking to looking for "ssh" first. If it > > finds ssh, it uses it. Otherwise, it tries to find rsh and use that. > > > > -- > > Jeff Squyres > > jsquy...@cisco.com <mailto:jsquy...@cisco.com> > > > > _______________________________________________ > > devel mailing list > > devel@lists.open-mpi.org <mailto:devel@lists.open-mpi.org> > > https://lists.open-mpi.org/mailman/listinfo/devel > > <https://lists.open-mpi.org/mailman/listinfo/devel> > > _______________________________________________ > devel mailing list > devel@lists.open-mpi.org <mailto:devel@lists.open-mpi.org> > https://lists.open-mpi.org/mailman/listinfo/devel > <https://lists.open-mpi.org/mailman/listinfo/devel>_______________________________________________ > devel mailing list > devel@lists.open-mpi.org > https://lists.open-mpi.org/mailman/listinfo/devel
_______________________________________________ devel mailing list devel@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/devel