George,
I started to digg more in option 2 as you describe it. I believe I can make
that work.
For example I created this fake ssh :

$ cat ~/bin/ssh
#!/bin/bash
fname=env.$$
echo ">>>>>>>>>>>>> ssh" >> $fname
env >>$fname
echo ">>>>>>>>>>>>>>>>>>>>>>>>>>>" >>$fname
echo $@ >>$fname

And this one prints all args that the remote process will receive:

>>>>>>>>>>>>>>>>>>>>>>>>>>
-x 10.0.35.43 orted -mca ess "env" -mca ess_base_jobid "2752512000" -mca
ess_base_vpid 1 -mca ess_base_num_procs "3" -mca orte_node_regex
"ip-[2:10]-0-16-120,[2:10].0.35.43,[2:10].0.35.42@0(3)" -mca orte_hnp_uri
"2752512000.0;tcp://10.0.16.120:44789" -mca plm "rsh" --tree-spawn -mca
routed "radix" -mca orte_parent_uri "2752512000.0;tcp://10.0.16.120:44789"
-mca rmaps_base_mapping_policy "node" -mca pmix "^s1,s2,cray,isolated"

Now I am thinking that probably I don;t even need to create all those
openmpi env variables as I am hoping orted that will be started remotely
will start the final executable with the right env set. does this sound
right ?

Thx,
--Gabriel


On Fri, Mar 5, 2021 at 3:15 PM George Bosilca <bosi...@icl.utk.edu> wrote:

> Gabriel,
>
> You should be able to. Here are at least 2 different ways of doing this.
>
> 1. Purely MPI. Start singletons (or smaller groups), and connect via
> sockets using MPI_Comm_join. You can setup your own DNS-like service, with
> the goal of having the independent MPI jobs leave a trace there, such that
> they can find each other and create the initial socket.
>
> 2. You could replace ssh/rsh with a no-op script (that returns success
> such that the mpirun process thinks it successfully started the processes),
> and then handcraft the environment as you did for GASNet.
>
> 3. We have support for DVM (Distributed Virtual Machine) that basically
> created an independent service where different mpirun could connect to
> retrieve information. The mpirun using this dvm singleton, and fallback to
> MPI_Comm_connect/accept to recreate an MPI world.
>
> Good luck,
>   George.
>
>
> On Fri, Mar 5, 2021 at 2:08 PM Ralph Castain via devel <
> devel@lists.open-mpi.org> wrote:
>
>> I'm afraid that won't work - there is no way for the job to "self
>> assemble". One could create a way to do it, but it would take some
>> significant coding in the guts of OMPI to get there.
>>
>>
>> On Mar 5, 2021, at 9:40 AM, Gabriel Tanase via devel <
>> devel@lists.open-mpi.org> wrote:
>>
>> Hi all,
>> I decided to use mpi as the messaging layer for a multihost database.
>> However within my org I faced very strong opposition to allow passwordless
>> ssh or rsh. For security reasons we want to minimize the opportunities to
>> execute arbitrary codes on the db clusters. I don;t want to run other
>> things like slurm, etc.
>>
>> My question would be: Is there a way to start an mpi application by
>> running certain binaries on each host ? E.g., if my executable is "myapp"
>> can I start a server (orted???) on host zero and than start myapp on each
>> host with the right env variables set (for specifying the rank, num ranks,
>> etc.)
>>
>> For example when using another messaging API (GASnet) I was able to start
>> a server on host zero and then manually start the application binary on
>> each host (with some environment variables properly set) and all was good.
>>
>> I tried to reverse engineer a little the env variables used by mpirun
>> (mpirun -np 2 env) and then I copied these env variables in a shell script
>> prior to invoking my hello world mpirun but I got an error message implying
>> a server is not present:
>>
>> PMIx_Init failed for the following reason:
>>
>>   NOT-SUPPORTED
>>
>> Open MPI requires access to a local PMIx server to execute. Please ensure
>> that either you are operating in a PMIx-enabled environment, or use
>> "mpirun"
>> to execute the job.
>>
>> Here is the shell script for host0:
>>
>> $ cat env1.sh
>> #!/bin/bash
>>
>> export OMPI_COMM_WORLD_RANK=0
>> export PMIX_NAMESPACE=mpirun-38f9d3525c2c-53291@1
>> export PRTE_MCA_prte_base_help_aggregate=0
>> export TERM_PROGRAM=Apple_Terminal
>> export OMPI_MCA_num_procs=2
>> export TERM=xterm-256color
>> export SHELL=/bin/bash
>> export PMIX_VERSION=4.1.0a1
>> export OPAL_USER_PARAMS_GIVEN=1
>> export TMPDIR=/var/folders/_k/c4_xr5vd14j97fw7j8vzmd45_9hjbq/T/
>> export
>> Apple_PubSub_Socket_Render=/private/tmp/com.apple.launchd.HCXmdRI1WL/Render
>> export PMIX_SERVER_URI41=mpirun-38f9d3525c2c-53291@0.0;tcp4://
>> 192.168.0.180:52093
>> export TERM_PROGRAM_VERSION=421.2
>> export PMIX_RANK=0
>> export TERM_SESSION_ID=18212D82-DEB2-4AE8-A271-FB47AC71337B
>> export OMPI_COMM_WORLD_LOCAL_RANK=0
>> export OMPI_ARGV=
>> export OMPI_MCA_initial_wdir=/Users/igtanase/ompi
>> export USER=igtanase
>> export OMPI_UNIVERSE_SIZE=2
>> export SSH_AUTH_SOCK=/private/tmp/com.apple.launchd.PhcplcX3pC/Listeners
>> export OMPI_COMMAND=./exe
>> export __CF_USER_TEXT_ENCODING=0x54984577:0x0:0x0
>> export
>> OMPI_FILE_LOCATION=/var/folders/_k/c4_xr5vd14j97fw7j8vzmd45_9hjbq/T//prte.38f9d3525c2c.1419265399/dvm.53291/1/0
>> export PMIX_SERVER_URI21=mpirun-38f9d3525c2c-53291@0.0;tcp4://
>> 192.168.0.180:52093
>> export
>> PATH=/Users/igtanase/ompi/bin/:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin
>> export OMPI_COMM_WORLD_LOCAL_SIZE=2
>> export PRTE_MCA_pmix_session_server=1
>> export PWD=/Users/igtanase/ompi
>> export OMPI_COMM_WORLD_SIZE=2
>> export OMPI_WORLD_SIZE=2
>> export LANG=en_US.UTF-8
>> export XPC_FLAGS=0x0
>> export PMIX_GDS_MODULE=hash
>> export XPC_SERVICE_NAME=0
>> export HOME=/Users/igtanase
>> export SHLVL=2
>> export PMIX_SECURITY_MODE=native
>> export PMIX_HOSTNAME=38f9d3525c2c
>> export LOGNAME=igtanase
>> export OMPI_WORLD_LOCAL_SIZE=2
>> export PMIX_BFROP_BUFFER_TYPE=PMIX_BFROP_BUFFER_NON_DESC
>> export PRTE_LAUNCHED=1
>> export
>> PMIX_SERVER_TMPDIR=/var/folders/_k/c4_xr5vd14j97fw7j8vzmd45_9hjbq/T//prte.38f9d3525c2c.1419265399/dvm.53291
>> export OMPI_COMM_WORLD_NODE_RANK=0
>> export OMPI_MCA_cpu_type=x86_64
>> export
>> PMIX_SYSTEM_TMPDIR=/var/folders/_k/c4_xr5vd14j97fw7j8vzmd45_9hjbq/T/
>> export PMIX_SERVER_URI4=mpirun-38f9d3525c2c-53291@0.0;tcp4://
>> 192.168.0.180:52093
>> export OMPI_NUM_APP_CTX=1
>> export SECURITYSESSIONID=186a9
>> export PMIX_SERVER_URI3=mpirun-38f9d3525c2c-53291@0.0;tcp4://
>> 192.168.0.180:52093
>> export PMIX_SERVER_URI2=mpirun-38f9d3525c2c-53291@0.0;tcp4://
>> 192.168.0.180:52093
>> export _=/usr/bin/env
>>
>> ./exe
>>
>> Thx for your help,
>> --Gabriel
>>
>>
>>

Reply via email to