Forwarding because Pavel wasn't subscribed to the devel list, and it was rejected.
Begin forwarded message: > From: Pavel Emelyanov <xe...@parallels.com> > Subject: Re: [CRIU] [OMPI devel] Open MPI and CRIU stdout/stderr > Date: March 19, 2014 9:32:00 AM EDT > To: Adrian Reber <adr...@lisas.de> > Cc: "Jeff Squyres (jsquyres)" <jsquy...@cisco.com>, Open MPI Developers > <de...@open-mpi.org>, "<c...@openvz.org>" <c...@openvz.org> > > On 03/19/2014 05:25 PM, Jeff Squyres (jsquyres) wrote: >> On Mar 19, 2014, at 9:13 AM, Adrian Reber <adr...@lisas.de> wrote: >> >>> What does Open MPI do with the file descriptors for stdout/stderr? >> >> We admittedly do funny things with stdin, stdout, and stderr... The short >> version is that OMPI intercepts all the stdin, stdout, and stderr from each >> MPI process and relays it back up to mpirun through our IOF subsystem (IOF = >> I/O forwarding). >> >> Consider: users launch N processes (potentially on multiple different >> servers) via >> >> mpirun --hostfile hosts -np N my_mpi_executable >> >> They also expect to be able to use standard shell redirection via the mpirun >> command. For example: >> >> mpirun --hostfile hosts -np N my_mpi_executable |& tee out.txt >> >> To explain what happens, we have to explain a little of how OMPI launches >> processes. Let's take the ssh case, for simplicity (there are other >> mechanisms it can use to launch on remote servers, but for the purposes of >> this discussion, they're basically variants of what happens with ssh). >> >> 1. mpirun parses the hosts hostfile and extracts the list of servers on >> which to launch. >> 2. mpirun fork/execs an ssh command to each remote node, and launches the >> Open MPI helper daemon "orted" >> 3. The orted launches on the remote server, does some housekeeping, and >> eventually receives the launch command from mpirun >> 4. The launch command contains the executable and argv to fork/exec, and how >> many of them. >> 5. For example: mpirun --hostfile hosts -np 4 my_mpi_executable. If the >> "hosts" file contains serverA and serverB, then mpirun would launch 2 ssh's >> -- one each to serverA and serverB. After some startup negotiation, mpirun >> would send a launch command telling the orted on each of serverA and serverB >> to launch 2 copies of my_mpi_executable. >> 6. For each child that the orted will create, it: >> - creates (up to) 3 pipes, for: stdin, stdout, stderr >> - forks >> - closes stdin, stdout, stderr >> - dups the pipes into 0, 1, 2 >> - (by default, we actually close stdin on all processes except the first >> one) >> - execs my_mpi_application >> 7. In this way, the orted can intercept the stdout/stderr from the process >> and send it back to mpirun, which can then write it on its own >> stdout/stderr. And therefore shell redirection from mpirun works as >> expected. >> 8. Similarly, the stdin from mpirun can be sent to any process where we kept >> stdin open (as mentioned above, by default, this is only the first process). >> >> In short: the orted acts as a proxy for the stdout and stderr (and >> potentially stdin) for all launched processes. >> >>> Would it make sense to close stdout/stderr of each checkpointed process >>> before checkpointing it? >> >> Maybe...? >> >> But my gut reaction is that you don't want to because of the "continue" >> case. I.e., having the orted go through all the IOF setup again could be a >> bit tricky... We didn't need to do this for other checkpointing systems. >> > > > Adrian, > > Can you show how the process tree looks like, what subtree you dump (and > restore) and > where the mentioned pipes sit, so that we could decide how to dump them and > how to > recreate them on restore. > > I had an impression, that you dump the fork()-ed process, and it should have > pipes in > its stdios, right? > > Thanks, > Pavel -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/