Ralph,

I'll be happy to test whatever you send. I'll also setup the same revision on a few other systems and see if any patterns emerge wrt reproducibility.

Greg

On 7/2/14 10:50 PM, Ralph Castain wrote:
Rats - this is one tough race condition. I've been cycling it since your note, 
without hitting one problem.

I'm running Centos6 on a dual processor, 6-core Xeon, with a dedicated disk 
installed on the node. Apparently, that seems to be adequate to mitigate the 
problem.

I'll keep trying to find a way to replicate this, and will audit the code and 
see if I can visually spot something. If I do and send you a patch, would you 
be willing/able to test it?


On Jul 2, 2014, at 11:50 AM, Greg Thomsen <gthom...@arlut.utexas.edu> wrote:

Ralph,

I finally got some time to look into this again.  I can reproduce the issue 
when running against the latest nightlies for 1.8.2 (r32119) and 1.9 (r32113).

While verifying that, I noticed that testing on a different system required a 
much larger number of iterations (~2000), while on the original system the 
iterations were unaffected (~3).  Both are RHEL5 systems and which differ by 
the kernel revision booted, and the underlying hardware.  The harder to 
reproduce system is a quad processor, 16-core Opteron 6276 running 
2.6.18-274.el5xen while the original system is a quad processor, 4-core Xeon 
X5450 running 2.6.18-308.el5.

The only option supplied during configuration was the prefix to install.  
Attached is the log used for the latest trunk nightly (r32113).

Thanks for looking into this.  Let me know if you need anything else.

Greg

On 6/13/14 10:38 PM, Ralph Castain wrote:
Hi Greg

I've been running your script over and over again with the current 1.8.2 and 
svn developer's trunk, and I cannot get a failure. It just merrily runs.

Could you tell me how you configured OMPI to get this behavior?

Thanks
Ralph

On Jun 10, 2014, at 11:59 AM, Ralph Castain <r...@open-mpi.org> wrote:

Ouch - I'll try to chase it down.I'm unaware of anyone passing a significant 
amount of data via stdin before, so it's quite possible this has been around 
for awhile. Normally one avoids that practice.


On Jun 10, 2014, at 11:36 AM, Greg Thomsen <gthom...@arlut.utexas.edu> wrote:

All,

I believe I've found a bug in the I/O forwarding portion of OpenMPI which 
occasionally causes mpirun to generate additional data on standard output that 
was not produced by the application being run.

The application in question reads from standard input and writes to standard 
output only on the rank 0 process.  All non-rank 0 processes only participate 
in computation and do not produce data on standard output.  The application is 
used in standard Unix-like pipelines like so:

  A | mpirun -np 4 application | B

Since B is looking for structured input, it is sensitive to additional data 
being generated.

While chasing down the source of this problem, I've observed the following:

* The problem is sensitive to timing.  Using strace to figure out where
the problem lies can easily hide it.  Either of the following would
change how the issue was expressed:

   A | mpirun -np 4 strace -o output.txt -e read,write application | B
   A | strace -ff -o output.txt -e read,write mpirun -np 4 application
   | B

While harder to state definitively, redirecting in from file and out
to file, rather than through pipes, also appears to hide the problem.
Since the workflow in question has large volumes of data, using
file-based I/O isn't feasible and wasn't thoroughly explored during
testing.

* It appears to be correlated to a short I/O operation.  A short read
from application's standard output maps to the first byte of the
extraneous output sent to B.  Looking at hex dumps indicates that the
contents of a recent buffer are inadvertently written to B.

The attached text case can show this.

* This also is an issue for forwarding standard error from the rank 0
process.  Modifying application so that it only writes to standard
error, and then redirecting standard error to standard output in the
shell, will still cause the problem:

   A | mpirun -np 4 application 2>&1 | B

* This seems to only occur at the end of the data stream.  The pipeline
in question works through records and if it occurred earlier than the
last, it would be noticed.  The conditions where it was seen
regularly always pointed to the end of the data stream.

The attached shell script reproduces the problem in every version of OpenMPI 
tested (1.5.0, 1.6.3, 1.7.4, and 1.8.1).  Without any arguments, it reads a 
fixed amount of data from /dev/zero and then compares the size of the output 
from the above pipeline.  For versions exhibiting the bug, and the above 
conditions, the problem should be seen within the first ~20 attempts.  For 
other conditions I've seen the script run for a week without problem.

With an input path, it reads the first 1,000,000 bytes from the path supplied.  
With a fixed pattern in the data (see the compressed test input), it is easy to 
see that the extra data generated is a copy from earlier in the data stream.

Hopefully this gets someone in the right section of the code.  Let me know if 
additional information is needed.

Thanks!

Greg
<mpi_test_input.bz2><mpi-test.sh>_______________________________________________
devel mailing list
de...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this post: 
http://www.open-mpi.org/community/lists/devel/2014/06/14999.php


_______________________________________________
devel mailing list
de...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this post: 
http://www.open-mpi.org/community/lists/devel/2014/06/15006.php


<config.log.bz2>_______________________________________________
devel mailing list
de...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this post: 
http://www.open-mpi.org/community/lists/devel/2014/07/15068.php

_______________________________________________
devel mailing list
de...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this post: 
http://www.open-mpi.org/community/lists/devel/2014/07/15070.php



Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

Reply via email to