Forwarding to the list and other team members since I failed to include them in the original response to Kapil ... EM
-------- Forwarded Message -------- Subject: Re: [Dmtcp-forum] A possible issue around pipes Date: Thu, 13 Aug 2015 16:08:20 -0400 From: Eliot Moss <m...@cs.umass.edu> Reply-To: m...@cs.umass.edu To: Kapil Arya <kapil.arya...@gmail.com> Thank you, Kapil. Here is a guess as to the pattern that may be causing it. Considering the pipeline gunzip | java | gzip, one of the situations that appears to be a problem is a java tracing tool. It produces bursts of output of considerable size from the java to the gzip, and of course gzip is somewhat slower, so the pipe probably backs up. However, java is multithreaded and may continue execution while the large output buffer drains (I'd have to check the code to be sure, but I don't think java waits for the output to all be sent to the pipe before continuing at least some threads). Again, I can't say for sure exactly what happens -- I just have mangled output, though not *random* data -- but I think some output is missing, i.e., should be output by java (and maybe is) but is not received by gzip and thus does not appear in the .gz file. The .gz file itself is well-formed, so I don't think it is bytes getting dropped from gzip to the .gz output file. Clearly this could have to do with java, specifically, or with pipes generally. The output in question is coming from a complex java *agent* -- a C++ library dynamically linked with the running java instance, that is not DMTCP aware (shouldn't need to be, right?). I have also seen (weaker evidence) problems with java or gawk as the middle of a gunzip | ... | gzip pipeline without the complex java agent involved. Also, some problems might arise when either the program, or taking a checkpoint, takes a long time because an NFS mounted drive is being slow (for a variety of network or remote host issues). Thus finding the problem might require injecting substantial artificial delays, or at least doing thought experiments about timed-out I/O operations ... Hope these notions help ... Regards -- Eliot ------------------------------------------------------------------------------ _______________________________________________ Dmtcp-forum mailing list Dmtcp-forum@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dmtcp-forum