Thank you. At least its clear now that for the immediate problem I have to look at IOF code.
On 16. 10. 2015 03:32, Gilles Gouaillardet wrote: > Justin, > > IOF stands for Input/Output (aka I/O) Forwarding > > here is a very high level overview of a quite simple case. > on host A, you run > mpirun -host B,C -np 2 a.out > without any batch manager and TCP interconnect > > first, mpirun will fork&exec > ssh B orted ... > ssh C orted ... > > the orted daemons will first connect back to mpirun, using TCP and > ip/port passed on the orted command line. > > then the orted daemons will fork&exec a.out > a.out will contact its parent orted (iirc, TCP on v1.10 and Unix > socket from v2.x) via ip/port of port from the environment > when a.out want to communicate, they will connect to the remote a.out > via TCP using ip/port obtained from orted. > > from a.out point of view : > - stdin is either a pipe to orted or /dev/null > - stdout is a pty with orted on the other side > - stderr is a pipe to orted > > this is basically what happens in a quite simple case, > back to your question, mpi_hello.so does not contact mpirun. > orted.so contacts mpirun, and mpi_hello.so contacts orted.so, > and then mpi_hello.so contact other mpi_hello.so > > > note it is also possible to use direct launch (SLURM or cray alps can > do that) > instead of running > mpirun a.out > you simply do > srun a.out (or aprun a.out) > in the case of slurm (i am not sure about alps) there is no orted > daemons involved. > instead of contacting its orted, a.out contact the slurm daemons > (slurmd) so it can exchange information with remote a.out and figure > out how to contact them. > direct launch does not support dynamic process creation > (MPI_Comm_spawn and friends) > > > you can run > ompi_info --all > to list all the parameters. > and then you can do > mpirun --mca <name> <value> ... > to modify a parameter (such as timeout) > > that being said, i do not think that should be needed ... just make > sure there is no firewall running on your system, and you should be fine. > if some hosts have several interfaces, you can restrict to the one > that should work (e.g. eth0) with > mpirun --mca oob_tcp_if_include eth0 --mca btl_tcp_if_include eth0 ... > > > i hope this helps > > Gilles > > > On 10/16/2015 2:59 AM, Justin Cinkelj wrote: >> I'm trying to run OpenMPI in OSv container >> (https://github.com/cloudius-systems/osv). It's a single process, single >> address space VM, without fork, exec, openpty function. With some >> butchering of OSv and OpenMPI I was able to compile orted.so, and run it >> inside OSv via mpirun (mpirun is on remote machine). The orted.so loads >> mpi_hello.so and executes its main() in new pthread. >> >> Which than aborts due to communication failure/timeout - as reported by >> mpirun. I assume that that mpi_hello.so should connect back to mpirun, >> and report 'something' about itself. What could that be? >> Plus, where could I extend that timeout period - once mpirun closes, >> output from opal_output is not shown any more. >> >> Is there some highlevel overview about OpenMPI, how are modules >> connected, what is 'startup' sequence etc? >> ompi_info lists compiled modules, but I still don't know how are they >> connected. >> >> So basically - I lack knowledge of OpenMPI internals, and would highly >> appreciate links for "rookie" developers. >> Say https://github.com/open-mpi/ompi/wiki/IOFDesign tells me what IOF >> is, and a bit about its working. So, if someone has any list of such >> links - could it be shared? >> >> _______________________________________________ >> devel mailing list >> [email protected] >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >> Link to this post: >> http://www.open-mpi.org/community/lists/devel/2015/10/18181.php >> > > _______________________________________________ > devel mailing list > [email protected] > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2015/10/18189.php
