Forwarding to the list and other team members since I failed
to include them in the original response to Kapil ... EM


-------- Forwarded Message --------
Subject: Re: [Dmtcp-forum] A possible issue around pipes
Date: Thu, 13 Aug 2015 16:08:20 -0400
From: Eliot Moss <m...@cs.umass.edu>
Reply-To: m...@cs.umass.edu
To: Kapil Arya <kapil.arya...@gmail.com>

Thank you, Kapil.

Here is a guess as to the pattern that may be causing it.

Considering the pipeline gunzip | java | gzip, one of the
situations that appears to be a problem is a java tracing
tool.  It produces bursts of output of considerable size
from the java to the gzip, and of course gzip is somewhat
slower, so the pipe probably backs up.  However, java is
multithreaded and may continue execution while the large
output buffer drains (I'd have to check the code to be
sure, but I don't think java waits for the output to all
be sent to the pipe before continuing at least some threads).

Again, I can't say for sure exactly what happens -- I just
have mangled output, though not *random* data -- but I think
some output is missing, i.e., should be output by java (and
maybe is) but is not received by gzip and thus does not
appear in the .gz file.  The .gz file itself is well-formed,
so I don't think it is bytes getting dropped from gzip to
the .gz output file.

Clearly this could have to do with java, specifically, or
with pipes generally.  The output in question is coming
from a complex java *agent* -- a C++ library dynamically
linked with the running java instance, that is not DMTCP
aware (shouldn't need to be, right?).

I have also seen (weaker evidence) problems with java or
gawk as the middle of a gunzip | ... | gzip pipeline
without the complex java agent involved.

Also, some problems might arise when either the program,
or taking a checkpoint, takes a long time because an NFS
mounted drive is being slow (for a variety of network or
remote host issues).  Thus finding the problem might require
injecting substantial artificial delays, or at least doing
thought experiments about timed-out I/O operations ...

Hope these notions help ...

Regards -- Eliot



------------------------------------------------------------------------------
_______________________________________________
Dmtcp-forum mailing list
Dmtcp-forum@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dmtcp-forum

Reply via email to