[OMPI devel] MPI_Comm_spawn[_multiple] and orted

Pak Lui Wed, 31 May 2006 14:11:27 -0400

Hi,

When I run a spawn program over rsh/ssh, I notice that each time the
child program gets spawned, it will need to establish a new rsh/ssh
connection to the remote node to launch orted on that node, even the
parent executable and the orted are running on that node.


So I wonder if there is any way that we can use the parent orted to
launch the child program if they happen to be on the same node?

I try to compare to the spawn program to the scenario where I runmultiple executables in one mpirun command. For this run, I onlyestablish one connection to the remote node only, and both executablesshared the same remote connection.


% ./mpirun -np 2 -host burl-ct-v440-5 -prefix `pwd`/.. sleep 12 : -np 2
sleep 10
Password:

15015 /workspace/paklui/ompi/trunk/builds/sparc32-g/bin/../bin/orted
--bootprox
  15017 sleep 12
  15019 sleep 12
  15021 sleep 10
  15023 sleep 10

The reason that I want to find out if it is possible for orted to launchchild executable(s) without having to establish a new connection, isbecause the number of times that I can run 'qrsh' in SGE (or N1GE) isactually depended on the number of slots that the user initiallyallocated. That the slot number corresponds to the number of CPUs on anode. Each slot allows one 'qrsh' connection.

The issue is when I try to run a spawn job on a single node, or acluster of many 1-cpu nodes under SGE. The number of times that theprogram can spawn is limited by 'qrsh', that it forbids the childprogram to connect to the same node where the parent executable's ortedmight be already running there.

I am curious to see if I can find some solution to the problem here. Iam also looking to see if there are some tricks in SGE to get aroundthis issue, but workaround I can see aren't pretty though. So I welcomeyour questions, comments or suggestions on this.


--

Thanks,

- Pak Lui
pak....@sun.com

[OMPI devel] MPI_Comm_spawn[_multiple] and orted

Reply via email to