Rats - this is one tough race condition. I've been cycling it since your note, without hitting one problem.
I'm running Centos6 on a dual processor, 6-core Xeon, with a dedicated disk installed on the node. Apparently, that seems to be adequate to mitigate the problem. I'll keep trying to find a way to replicate this, and will audit the code and see if I can visually spot something. If I do and send you a patch, would you be willing/able to test it? On Jul 2, 2014, at 11:50 AM, Greg Thomsen <gthom...@arlut.utexas.edu> wrote: > Ralph, > > I finally got some time to look into this again. I can reproduce the issue > when running against the latest nightlies for 1.8.2 (r32119) and 1.9 (r32113). > > While verifying that, I noticed that testing on a different system required a > much larger number of iterations (~2000), while on the original system the > iterations were unaffected (~3). Both are RHEL5 systems and which differ by > the kernel revision booted, and the underlying hardware. The harder to > reproduce system is a quad processor, 16-core Opteron 6276 running > 2.6.18-274.el5xen while the original system is a quad processor, 4-core Xeon > X5450 running 2.6.18-308.el5. > > The only option supplied during configuration was the prefix to install. > Attached is the log used for the latest trunk nightly (r32113). > > Thanks for looking into this. Let me know if you need anything else. > > Greg > > On 6/13/14 10:38 PM, Ralph Castain wrote: >> Hi Greg >> >> I've been running your script over and over again with the current 1.8.2 and >> svn developer's trunk, and I cannot get a failure. It just merrily runs. >> >> Could you tell me how you configured OMPI to get this behavior? >> >> Thanks >> Ralph >> >> On Jun 10, 2014, at 11:59 AM, Ralph Castain <r...@open-mpi.org> wrote: >> >>> Ouch - I'll try to chase it down.I'm unaware of anyone passing a >>> significant amount of data via stdin before, so it's quite possible this >>> has been around for awhile. Normally one avoids that practice. >>> >>> >>> On Jun 10, 2014, at 11:36 AM, Greg Thomsen <gthom...@arlut.utexas.edu> >>> wrote: >>> >>>> All, >>>> >>>> I believe I've found a bug in the I/O forwarding portion of OpenMPI which >>>> occasionally causes mpirun to generate additional data on standard output >>>> that was not produced by the application being run. >>>> >>>> The application in question reads from standard input and writes to >>>> standard output only on the rank 0 process. All non-rank 0 processes only >>>> participate in computation and do not produce data on standard output. >>>> The application is used in standard Unix-like pipelines like so: >>>> >>>> A | mpirun -np 4 application | B >>>> >>>> Since B is looking for structured input, it is sensitive to additional >>>> data being generated. >>>> >>>> While chasing down the source of this problem, I've observed the following: >>>> >>>> * The problem is sensitive to timing. Using strace to figure out where >>>> the problem lies can easily hide it. Either of the following would >>>> change how the issue was expressed: >>>> >>>> A | mpirun -np 4 strace -o output.txt -e read,write application | B >>>> A | strace -ff -o output.txt -e read,write mpirun -np 4 application >>>> | B >>>> >>>> While harder to state definitively, redirecting in from file and out >>>> to file, rather than through pipes, also appears to hide the problem. >>>> Since the workflow in question has large volumes of data, using >>>> file-based I/O isn't feasible and wasn't thoroughly explored during >>>> testing. >>>> >>>> * It appears to be correlated to a short I/O operation. A short read >>>> from application's standard output maps to the first byte of the >>>> extraneous output sent to B. Looking at hex dumps indicates that the >>>> contents of a recent buffer are inadvertently written to B. >>>> >>>> The attached text case can show this. >>>> >>>> * This also is an issue for forwarding standard error from the rank 0 >>>> process. Modifying application so that it only writes to standard >>>> error, and then redirecting standard error to standard output in the >>>> shell, will still cause the problem: >>>> >>>> A | mpirun -np 4 application 2>&1 | B >>>> >>>> * This seems to only occur at the end of the data stream. The pipeline >>>> in question works through records and if it occurred earlier than the >>>> last, it would be noticed. The conditions where it was seen >>>> regularly always pointed to the end of the data stream. >>>> >>>> The attached shell script reproduces the problem in every version of >>>> OpenMPI tested (1.5.0, 1.6.3, 1.7.4, and 1.8.1). Without any arguments, >>>> it reads a fixed amount of data from /dev/zero and then compares the size >>>> of the output from the above pipeline. For versions exhibiting the bug, >>>> and the above conditions, the problem should be seen within the first ~20 >>>> attempts. For other conditions I've seen the script run for a week >>>> without problem. >>>> >>>> With an input path, it reads the first 1,000,000 bytes from the path >>>> supplied. With a fixed pattern in the data (see the compressed test >>>> input), it is easy to see that the extra data generated is a copy from >>>> earlier in the data stream. >>>> >>>> Hopefully this gets someone in the right section of the code. Let me know >>>> if additional information is needed. >>>> >>>> Thanks! >>>> >>>> Greg >>>> <mpi_test_input.bz2><mpi-test.sh>_______________________________________________ >>>> devel mailing list >>>> de...@open-mpi.org >>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>> Link to this post: >>>> http://www.open-mpi.org/community/lists/devel/2014/06/14999.php >>> >> >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >> Link to this post: >> http://www.open-mpi.org/community/lists/devel/2014/06/15006.php >> > > <config.log.bz2>_______________________________________________ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/07/15068.php