On 5/12/2015 4:49 PM, Gene Cooperman wrote: > 1) We still owe you on the restart issue of 2/26/15. It's surprisingly > subtle.
Thanks for keeping at it. > 2) "Am I right in guessing that a file on a remote drive is identified > in part by the uuid of the local mount point, and in part by information > about the remote drive?" > > I believe that DMTCP knows nothing about local or remote drives or > mount points. It should view files as part of a single unified > filesystem. Ok, so here's a model of what might be happening. DMTCP notes the _device_ a file comes from. When a system is rebuilt, the device looks different. All we need is one open file on that device to mess up the process. > Is there a simple way that we could locally test what you're seeing, > without having to crash a Grid Engine compute node :-). Not sure. I'm not 100% sure what is happening. I've even seen where a recently taken checkpoint fails (and all earlier ones do too). > 3) "For java jar files, it appears that every checkpoint makes another copy of > an open jar file -- even when (as far as I know) such files are read > only." > > The default policy of DMTCP should be that if the file is read-only, > and even if it is writeable but the offset is at the end of the file, > then DMTCP should _not_ be making a copy of the file. The flag > --checkpoint-open-files for dmtcp_launch is intended to force DMTCP > to make copies of open files in order to overcome that default behavior. > > If you're seeing something different, could you confirm that? In that > case, I'll check again locally, to verify this bug. Thanks. It's not different, just suboptimal I think. The files are almost certainly opened only for reading (though I may be able to check that). The position in the files may skip around -- jar files are probed repeated to load different class files and thus are accessed in more of a random access way than a strictly sequential access way. They may also be mapped into memory rather than accessed by read/write calls. Thanks for the other info ... Regards -- Eliot ------------------------------------------------------------------------------ One dashboard for servers and applications across Physical-Virtual-Cloud Widest out-of-the-box monitoring support with 50+ applications Performance metrics, stats and reports that give you Actionable Insights Deep dive visibility with transaction tracing using APM Insight. http://ad.doubleclick.net/ddm/clk/290420510;117567292;y _______________________________________________ Dmtcp-forum mailing list Dmtcp-forum@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dmtcp-forum