Re: [OMPI devel] mpirun Produces Extraneous Output

Ralph Castain Fri, 13 Jun 2014 23:38:57 -0400 (EDT)

Hi Greg

I've been running your script over and over again with the current 1.8.2 and 
svn developer's trunk, and I cannot get a failure. It just merrily runs.


Could you tell me how you configured OMPI to get this behavior?

Thanks
Ralph

On Jun 10, 2014, at 11:59 AM, Ralph Castain <r...@open-mpi.org> wrote:

> Ouch - I'll try to chase it down.I'm unaware of anyone passing a significant 
> amount of data via stdin before, so it's quite possible this has been around 
> for awhile. Normally one avoids that practice.
> 
> 
> On Jun 10, 2014, at 11:36 AM, Greg Thomsen <gthom...@arlut.utexas.edu> wrote:
> 
>> All,
>> 
>> I believe I've found a bug in the I/O forwarding portion of OpenMPI which 
>> occasionally causes mpirun to generate additional data on standard output 
>> that was not produced by the application being run.
>> 
>> The application in question reads from standard input and writes to standard 
>> output only on the rank 0 process.  All non-rank 0 processes only 
>> participate in computation and do not produce data on standard output.  The 
>> application is used in standard Unix-like pipelines like so:
>> 
>>  A | mpirun -np 4 application | B
>> 
>> Since B is looking for structured input, it is sensitive to additional data 
>> being generated.
>> 
>> While chasing down the source of this problem, I've observed the following:
>> 
>> * The problem is sensitive to timing.  Using strace to figure out where
>> the problem lies can easily hide it.  Either of the following would
>> change how the issue was expressed:
>> 
>>   A | mpirun -np 4 strace -o output.txt -e read,write application | B
>>   A | strace -ff -o output.txt -e read,write mpirun -np 4 application
>>   | B
>> 
>> While harder to state definitively, redirecting in from file and out
>> to file, rather than through pipes, also appears to hide the problem.
>> Since the workflow in question has large volumes of data, using
>> file-based I/O isn't feasible and wasn't thoroughly explored during
>> testing.
>> 
>> * It appears to be correlated to a short I/O operation.  A short read
>> from application's standard output maps to the first byte of the
>> extraneous output sent to B.  Looking at hex dumps indicates that the
>> contents of a recent buffer are inadvertently written to B.
>> 
>> The attached text case can show this.
>> 
>> * This also is an issue for forwarding standard error from the rank 0
>> process.  Modifying application so that it only writes to standard
>> error, and then redirecting standard error to standard output in the
>> shell, will still cause the problem:
>> 
>>   A | mpirun -np 4 application 2>&1 | B
>> 
>> * This seems to only occur at the end of the data stream.  The pipeline
>> in question works through records and if it occurred earlier than the
>> last, it would be noticed.  The conditions where it was seen
>> regularly always pointed to the end of the data stream.
>> 
>> The attached shell script reproduces the problem in every version of OpenMPI 
>> tested (1.5.0, 1.6.3, 1.7.4, and 1.8.1).  Without any arguments, it reads a 
>> fixed amount of data from /dev/zero and then compares the size of the output 
>> from the above pipeline.  For versions exhibiting the bug, and the above 
>> conditions, the problem should be seen within the first ~20 attempts.  For 
>> other conditions I've seen the script run for a week without problem.
>> 
>> With an input path, it reads the first 1,000,000 bytes from the path 
>> supplied.  With a fixed pattern in the data (see the compressed test input), 
>> it is easy to see that the extra data generated is a copy from earlier in 
>> the data stream.
>> 
>> Hopefully this gets someone in the right section of the code.  Let me know 
>> if additional information is needed.
>> 
>> Thanks!
>> 
>> Greg
>> <mpi_test_input.bz2><mpi-test.sh>_______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2014/06/14999.php
>

Re: [OMPI devel] mpirun Produces Extraneous Output

Reply via email to