George, I started to digg more in option 2 as you describe it. I believe I can make that work. For example I created this fake ssh :
$ cat ~/bin/ssh #!/bin/bash fname=env.$$ echo ">>>>>>>>>>>>> ssh" >> $fname env >>$fname echo ">>>>>>>>>>>>>>>>>>>>>>>>>>>" >>$fname echo $@ >>$fname And this one prints all args that the remote process will receive: >>>>>>>>>>>>>>>>>>>>>>>>>> -x 10.0.35.43 orted -mca ess "env" -mca ess_base_jobid "2752512000" -mca ess_base_vpid 1 -mca ess_base_num_procs "3" -mca orte_node_regex "ip-[2:10]-0-16-120,[2:10].0.35.43,[2:10].0.35.42@0(3)" -mca orte_hnp_uri "2752512000.0;tcp://10.0.16.120:44789" -mca plm "rsh" --tree-spawn -mca routed "radix" -mca orte_parent_uri "2752512000.0;tcp://10.0.16.120:44789" -mca rmaps_base_mapping_policy "node" -mca pmix "^s1,s2,cray,isolated" Now I am thinking that probably I don;t even need to create all those openmpi env variables as I am hoping orted that will be started remotely will start the final executable with the right env set. does this sound right ? Thx, --Gabriel On Fri, Mar 5, 2021 at 3:15 PM George Bosilca <bosi...@icl.utk.edu> wrote: > Gabriel, > > You should be able to. Here are at least 2 different ways of doing this. > > 1. Purely MPI. Start singletons (or smaller groups), and connect via > sockets using MPI_Comm_join. You can setup your own DNS-like service, with > the goal of having the independent MPI jobs leave a trace there, such that > they can find each other and create the initial socket. > > 2. You could replace ssh/rsh with a no-op script (that returns success > such that the mpirun process thinks it successfully started the processes), > and then handcraft the environment as you did for GASNet. > > 3. We have support for DVM (Distributed Virtual Machine) that basically > created an independent service where different mpirun could connect to > retrieve information. The mpirun using this dvm singleton, and fallback to > MPI_Comm_connect/accept to recreate an MPI world. > > Good luck, > George. > > > On Fri, Mar 5, 2021 at 2:08 PM Ralph Castain via devel < > devel@lists.open-mpi.org> wrote: > >> I'm afraid that won't work - there is no way for the job to "self >> assemble". One could create a way to do it, but it would take some >> significant coding in the guts of OMPI to get there. >> >> >> On Mar 5, 2021, at 9:40 AM, Gabriel Tanase via devel < >> devel@lists.open-mpi.org> wrote: >> >> Hi all, >> I decided to use mpi as the messaging layer for a multihost database. >> However within my org I faced very strong opposition to allow passwordless >> ssh or rsh. For security reasons we want to minimize the opportunities to >> execute arbitrary codes on the db clusters. I don;t want to run other >> things like slurm, etc. >> >> My question would be: Is there a way to start an mpi application by >> running certain binaries on each host ? E.g., if my executable is "myapp" >> can I start a server (orted???) on host zero and than start myapp on each >> host with the right env variables set (for specifying the rank, num ranks, >> etc.) >> >> For example when using another messaging API (GASnet) I was able to start >> a server on host zero and then manually start the application binary on >> each host (with some environment variables properly set) and all was good. >> >> I tried to reverse engineer a little the env variables used by mpirun >> (mpirun -np 2 env) and then I copied these env variables in a shell script >> prior to invoking my hello world mpirun but I got an error message implying >> a server is not present: >> >> PMIx_Init failed for the following reason: >> >> NOT-SUPPORTED >> >> Open MPI requires access to a local PMIx server to execute. Please ensure >> that either you are operating in a PMIx-enabled environment, or use >> "mpirun" >> to execute the job. >> >> Here is the shell script for host0: >> >> $ cat env1.sh >> #!/bin/bash >> >> export OMPI_COMM_WORLD_RANK=0 >> export PMIX_NAMESPACE=mpirun-38f9d3525c2c-53291@1 >> export PRTE_MCA_prte_base_help_aggregate=0 >> export TERM_PROGRAM=Apple_Terminal >> export OMPI_MCA_num_procs=2 >> export TERM=xterm-256color >> export SHELL=/bin/bash >> export PMIX_VERSION=4.1.0a1 >> export OPAL_USER_PARAMS_GIVEN=1 >> export TMPDIR=/var/folders/_k/c4_xr5vd14j97fw7j8vzmd45_9hjbq/T/ >> export >> Apple_PubSub_Socket_Render=/private/tmp/com.apple.launchd.HCXmdRI1WL/Render >> export PMIX_SERVER_URI41=mpirun-38f9d3525c2c-53291@0.0;tcp4:// >> 192.168.0.180:52093 >> export TERM_PROGRAM_VERSION=421.2 >> export PMIX_RANK=0 >> export TERM_SESSION_ID=18212D82-DEB2-4AE8-A271-FB47AC71337B >> export OMPI_COMM_WORLD_LOCAL_RANK=0 >> export OMPI_ARGV= >> export OMPI_MCA_initial_wdir=/Users/igtanase/ompi >> export USER=igtanase >> export OMPI_UNIVERSE_SIZE=2 >> export SSH_AUTH_SOCK=/private/tmp/com.apple.launchd.PhcplcX3pC/Listeners >> export OMPI_COMMAND=./exe >> export __CF_USER_TEXT_ENCODING=0x54984577:0x0:0x0 >> export >> OMPI_FILE_LOCATION=/var/folders/_k/c4_xr5vd14j97fw7j8vzmd45_9hjbq/T//prte.38f9d3525c2c.1419265399/dvm.53291/1/0 >> export PMIX_SERVER_URI21=mpirun-38f9d3525c2c-53291@0.0;tcp4:// >> 192.168.0.180:52093 >> export >> PATH=/Users/igtanase/ompi/bin/:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin >> export OMPI_COMM_WORLD_LOCAL_SIZE=2 >> export PRTE_MCA_pmix_session_server=1 >> export PWD=/Users/igtanase/ompi >> export OMPI_COMM_WORLD_SIZE=2 >> export OMPI_WORLD_SIZE=2 >> export LANG=en_US.UTF-8 >> export XPC_FLAGS=0x0 >> export PMIX_GDS_MODULE=hash >> export XPC_SERVICE_NAME=0 >> export HOME=/Users/igtanase >> export SHLVL=2 >> export PMIX_SECURITY_MODE=native >> export PMIX_HOSTNAME=38f9d3525c2c >> export LOGNAME=igtanase >> export OMPI_WORLD_LOCAL_SIZE=2 >> export PMIX_BFROP_BUFFER_TYPE=PMIX_BFROP_BUFFER_NON_DESC >> export PRTE_LAUNCHED=1 >> export >> PMIX_SERVER_TMPDIR=/var/folders/_k/c4_xr5vd14j97fw7j8vzmd45_9hjbq/T//prte.38f9d3525c2c.1419265399/dvm.53291 >> export OMPI_COMM_WORLD_NODE_RANK=0 >> export OMPI_MCA_cpu_type=x86_64 >> export >> PMIX_SYSTEM_TMPDIR=/var/folders/_k/c4_xr5vd14j97fw7j8vzmd45_9hjbq/T/ >> export PMIX_SERVER_URI4=mpirun-38f9d3525c2c-53291@0.0;tcp4:// >> 192.168.0.180:52093 >> export OMPI_NUM_APP_CTX=1 >> export SECURITYSESSIONID=186a9 >> export PMIX_SERVER_URI3=mpirun-38f9d3525c2c-53291@0.0;tcp4:// >> 192.168.0.180:52093 >> export PMIX_SERVER_URI2=mpirun-38f9d3525c2c-53291@0.0;tcp4:// >> 192.168.0.180:52093 >> export _=/usr/bin/env >> >> ./exe >> >> Thx for your help, >> --Gabriel >> >> >>