Forwarding because Pavel wasn't subscribed to the devel list, and it was 
rejected.


Begin forwarded message:

> From: Pavel Emelyanov <xe...@parallels.com>
> Subject: Re: [CRIU] [OMPI devel] Open MPI and CRIU stdout/stderr
> Date: March 19, 2014 9:32:00 AM EDT
> To: Adrian Reber <adr...@lisas.de>
> Cc: "Jeff Squyres (jsquyres)" <jsquy...@cisco.com>, Open MPI Developers 
> <de...@open-mpi.org>, "<c...@openvz.org>" <c...@openvz.org>
> 
> On 03/19/2014 05:25 PM, Jeff Squyres (jsquyres) wrote:
>> On Mar 19, 2014, at 9:13 AM, Adrian Reber <adr...@lisas.de> wrote:
>> 
>>> What does Open MPI do with the file descriptors for stdout/stderr?
>> 
>> We admittedly do funny things with stdin, stdout, and stderr...  The short 
>> version is that OMPI intercepts all the stdin, stdout, and stderr from each 
>> MPI process and relays it back up to mpirun through our IOF subsystem (IOF = 
>> I/O forwarding).
>> 
>> Consider: users launch N processes (potentially on multiple different 
>> servers) via
>> 
>>   mpirun --hostfile hosts -np N my_mpi_executable
>> 
>> They also expect to be able to use standard shell redirection via the mpirun 
>> command.  For example:
>> 
>>   mpirun --hostfile hosts -np N my_mpi_executable |& tee out.txt
>> 
>> To explain what happens, we have to explain a little of how OMPI launches 
>> processes. Let's take the ssh case, for simplicity (there are other 
>> mechanisms it can use to launch on remote servers, but for the purposes of 
>> this discussion, they're basically variants of what happens with ssh).
>> 
>> 1. mpirun parses the hosts hostfile and extracts the list of servers on 
>> which to launch.
>> 2. mpirun fork/execs an ssh command to each remote node, and launches the 
>> Open MPI helper daemon "orted"
>> 3. The orted launches on the remote server, does some housekeeping, and 
>> eventually receives the launch command from mpirun
>> 4. The launch command contains the executable and argv to fork/exec, and how 
>> many of them.  
>> 5. For example: mpirun --hostfile hosts -np 4 my_mpi_executable.  If the 
>> "hosts" file contains serverA and serverB, then mpirun would launch 2 ssh's 
>> -- one each to serverA and serverB.  After some startup negotiation, mpirun 
>> would send a launch command telling the orted on each of serverA and serverB 
>> to launch 2 copies of my_mpi_executable.
>> 6. For each child that the orted will create, it:
>>   - creates (up to) 3 pipes, for: stdin, stdout, stderr
>>   - forks
>>   - closes stdin, stdout, stderr
>>   - dups the pipes into 0, 1, 2
>>   - (by default, we actually close stdin on all processes except the first 
>> one)
>>   - execs my_mpi_application
>> 7. In this way, the orted can intercept the stdout/stderr from the process 
>> and send it back to mpirun, which can then write it on its own 
>> stdout/stderr.  And therefore shell redirection from mpirun works as 
>> expected.
>> 8. Similarly, the stdin from mpirun can be sent to any process where we kept 
>> stdin open (as mentioned above, by default, this is only the first process).
>> 
>> In short: the orted acts as a proxy for the stdout and stderr (and 
>> potentially stdin) for all launched processes.
>> 
>>> Would it make sense to close stdout/stderr of each checkpointed process
>>> before checkpointing it?
>> 
>> Maybe...?
>> 
>> But my gut reaction is that you don't want to because of the "continue" 
>> case.  I.e., having the orted go through all the IOF setup again could be a 
>> bit tricky...  We didn't need to do this for other checkpointing systems.
>> 
> 
> 
> Adrian,
> 
> Can you show how the process tree looks like, what subtree you dump (and 
> restore) and
> where the mentioned pipes sit, so that we could decide how to dump them and 
> how to
> recreate them on restore.
> 
> I had an impression, that you dump the fork()-ed process, and it should have 
> pipes in
> its stdios, right?
> 
> Thanks,
> Pavel


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

Reply via email to