Rats - this is one tough race condition. I've been cycling it since your note, 
without hitting one problem.

I'm running Centos6 on a dual processor, 6-core Xeon, with a dedicated disk 
installed on the node. Apparently, that seems to be adequate to mitigate the 
problem.

I'll keep trying to find a way to replicate this, and will audit the code and 
see if I can visually spot something. If I do and send you a patch, would you 
be willing/able to test it?


On Jul 2, 2014, at 11:50 AM, Greg Thomsen <gthom...@arlut.utexas.edu> wrote:

> Ralph,
> 
> I finally got some time to look into this again.  I can reproduce the issue 
> when running against the latest nightlies for 1.8.2 (r32119) and 1.9 (r32113).
> 
> While verifying that, I noticed that testing on a different system required a 
> much larger number of iterations (~2000), while on the original system the 
> iterations were unaffected (~3).  Both are RHEL5 systems and which differ by 
> the kernel revision booted, and the underlying hardware.  The harder to 
> reproduce system is a quad processor, 16-core Opteron 6276 running 
> 2.6.18-274.el5xen while the original system is a quad processor, 4-core Xeon 
> X5450 running 2.6.18-308.el5.
> 
> The only option supplied during configuration was the prefix to install.  
> Attached is the log used for the latest trunk nightly (r32113).
> 
> Thanks for looking into this.  Let me know if you need anything else.
> 
> Greg
> 
> On 6/13/14 10:38 PM, Ralph Castain wrote:
>> Hi Greg
>> 
>> I've been running your script over and over again with the current 1.8.2 and 
>> svn developer's trunk, and I cannot get a failure. It just merrily runs.
>> 
>> Could you tell me how you configured OMPI to get this behavior?
>> 
>> Thanks
>> Ralph
>> 
>> On Jun 10, 2014, at 11:59 AM, Ralph Castain <r...@open-mpi.org> wrote:
>> 
>>> Ouch - I'll try to chase it down.I'm unaware of anyone passing a 
>>> significant amount of data via stdin before, so it's quite possible this 
>>> has been around for awhile. Normally one avoids that practice.
>>> 
>>> 
>>> On Jun 10, 2014, at 11:36 AM, Greg Thomsen <gthom...@arlut.utexas.edu> 
>>> wrote:
>>> 
>>>> All,
>>>> 
>>>> I believe I've found a bug in the I/O forwarding portion of OpenMPI which 
>>>> occasionally causes mpirun to generate additional data on standard output 
>>>> that was not produced by the application being run.
>>>> 
>>>> The application in question reads from standard input and writes to 
>>>> standard output only on the rank 0 process.  All non-rank 0 processes only 
>>>> participate in computation and do not produce data on standard output.  
>>>> The application is used in standard Unix-like pipelines like so:
>>>> 
>>>>  A | mpirun -np 4 application | B
>>>> 
>>>> Since B is looking for structured input, it is sensitive to additional 
>>>> data being generated.
>>>> 
>>>> While chasing down the source of this problem, I've observed the following:
>>>> 
>>>> * The problem is sensitive to timing.  Using strace to figure out where
>>>> the problem lies can easily hide it.  Either of the following would
>>>> change how the issue was expressed:
>>>> 
>>>>   A | mpirun -np 4 strace -o output.txt -e read,write application | B
>>>>   A | strace -ff -o output.txt -e read,write mpirun -np 4 application
>>>>   | B
>>>> 
>>>> While harder to state definitively, redirecting in from file and out
>>>> to file, rather than through pipes, also appears to hide the problem.
>>>> Since the workflow in question has large volumes of data, using
>>>> file-based I/O isn't feasible and wasn't thoroughly explored during
>>>> testing.
>>>> 
>>>> * It appears to be correlated to a short I/O operation.  A short read
>>>> from application's standard output maps to the first byte of the
>>>> extraneous output sent to B.  Looking at hex dumps indicates that the
>>>> contents of a recent buffer are inadvertently written to B.
>>>> 
>>>> The attached text case can show this.
>>>> 
>>>> * This also is an issue for forwarding standard error from the rank 0
>>>> process.  Modifying application so that it only writes to standard
>>>> error, and then redirecting standard error to standard output in the
>>>> shell, will still cause the problem:
>>>> 
>>>>   A | mpirun -np 4 application 2>&1 | B
>>>> 
>>>> * This seems to only occur at the end of the data stream.  The pipeline
>>>> in question works through records and if it occurred earlier than the
>>>> last, it would be noticed.  The conditions where it was seen
>>>> regularly always pointed to the end of the data stream.
>>>> 
>>>> The attached shell script reproduces the problem in every version of 
>>>> OpenMPI tested (1.5.0, 1.6.3, 1.7.4, and 1.8.1).  Without any arguments, 
>>>> it reads a fixed amount of data from /dev/zero and then compares the size 
>>>> of the output from the above pipeline.  For versions exhibiting the bug, 
>>>> and the above conditions, the problem should be seen within the first ~20 
>>>> attempts.  For other conditions I've seen the script run for a week 
>>>> without problem.
>>>> 
>>>> With an input path, it reads the first 1,000,000 bytes from the path 
>>>> supplied.  With a fixed pattern in the data (see the compressed test 
>>>> input), it is easy to see that the extra data generated is a copy from 
>>>> earlier in the data stream.
>>>> 
>>>> Hopefully this gets someone in the right section of the code.  Let me know 
>>>> if additional information is needed.
>>>> 
>>>> Thanks!
>>>> 
>>>> Greg
>>>> <mpi_test_input.bz2><mpi-test.sh>_______________________________________________
>>>> devel mailing list
>>>> de...@open-mpi.org
>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>> Link to this post: 
>>>> http://www.open-mpi.org/community/lists/devel/2014/06/14999.php
>>> 
>> 
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2014/06/15006.php
>> 
> 
> <config.log.bz2>_______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/07/15068.php

Reply via email to