Ralph Castain wrote:
On 7/17/07 5:37 AM, "Jeff Squyres" <jsquy...@cisco.com> wrote:
On Jul 16, 2007, at 2:28 PM, Matthew Moskewicz wrote:
MPI-2 does support the MPI_COMM_JOIN and MPI_COMM_ACCEPT/
MPI_COMM_CONNECT models. We do support this in Open MPI, but the
restrictions (in terms of ORTE) may not be sufficient for you.
perhaps i'll experiment -- any clues as to what the orte
restrictions might be?
The main constraint is that you have to run a "persistent" orted that
will span all your MPI_COMM_WORLD's. We have only lightly tested
this scenario -- Ralph, can you comment more here?
Actually, I'm not convinced Open MPI really supports either of those two MPI
semantics. It is true that we have something in our code repository, but I'm
not convinced it actually does what people think.
There are two use-cases one must consider:
1. an application code spawns another job and then at some later point wants
to connect to it. Our current implementation of comm_spawn does this
automatically via the accept/connect procedure, so we have this covered.
However, it relies upon the notion that (a) the parent job *knows* the jobid
of the child, and (b) the parent sends a message to the child telling it
where and how to rendezvous with it. You don't need the persistent daemon
here.
2. a user starts one application, and then starts another (would have to be
in a separate window or batch job as we do not support running mpirun in the
background) that connects to the first. The problem here is that neither
application knows the jobid of the other, has no info on how to communicate
with the other, nor knows a common rendezvous point. You would definitely
need a persistent daemon for this use-case.
I would have to review the code to see, but my best guess from what I
remember is that we don't actually support the second use-case at this time.
It would be possible to do so, albeit complicated - but I'm almost certain
nobody ever implemented it. I had talked at one time about providing the
necessary glue, either at the command line or (better) via some internal
"magic", but never got much interest - and so never did anything about
it...and I don't recall seeing anyone else make the necessary changes.
FWIW, these are the instructions that we documented for OMPI v1.2 for
client/server
(MPI_COMM_ACCEPT and MPI_COMM_CONNECT) from different jobs.
-----------------------------------------------------------------------------------------
USING MPI CLIENT/SERVER APPLICATIONS
The instructions in this section explain how to get best results when
starting Open
MPI client/server applications.
To Start the Persistent Daemon
Note – The persistent daemon needs to run on the node where mpirun is
started.
1. Use the cd command to move to the directory that contains the Sun HPC
ClusterTools 7 binaries.
% cd /opt/SUNWhpc/HPC7.0/bin
2. To start the persistent daemon, issue the following command,
substituting the
name of your MPI job’s universe for univ1:
% orted --persistent --seed --scope public --universe univ1 --debug
The --persistent flag to orted (the ORTE daemon) starts the persistent
daemon.
You also need to set the --seed and --scope public options on the same
command line, as shown in the example. The optional --debug flag prints out
debugging messages.
TO LAUNCH THE CLIENT/SERVER JOB
Note – Make sure you launch all MPI client/server jobs from the same node on
which you started the persistent daemon.
1. Type the following command to launch the server application.
Substitute the
name of your MPI job’s universe for univ1:
% ./mpirun -np 1 --universe univ1 t_accept
2. Type the following command to launch the client application,
substituting the
name of your MPI job’s universe for univ1:
% ./mpirun -np 4 --universe univ1 t_connect
If the client and server jobs span more than 1 node, the first job (that
is, the server
job) must specify on the mpirun command line all the nodes that will be
used.
Specifying the node names allocates the specified hosts from the entire
universe of
server and client jobs.
For example, if the server runs on node0 and the client job runs on
node1 only, the
command to launch the server must specify both nodes (using the -host
node0,node1 flag) even it uses only one process on node0.
Assuming that the persistent daemon is started on node0, the command to
launch
the server would look like this:
node0% ./mpirun -np 1 --universe univ1 -host node0,node1 t_accept
The command to launch the client is:
n ode0% ./mpirun -np 4 --universe univ1 -host node1 t_connect
Note – Name publishing does not work in jobs between different universes.