Hi Calvin, Thanks for these very detailed observations. Here's my immediate feedback.
* As I said, rc2 and rc3 are in a regrettable disrepair when it comes to a 32-bit O/S. By the end of this week, we should have an rc4 that is intended to also work properly for a 32-bit O/S. If you get a chance to test again under rc4, we'll be grateful. (We'll test also, but there is so much diversity out there, that I don't trust the tests of any one group.) * If DMTCP is failing for a 64-bit O/S right now in ocaml, then we can start testing that immediately. I assume this is ocaml from INRIA: http://caml.inria.fr/download.en.html (version 4.02, currently) My understanding is that we should test both for control-C after restart and control-C prior to checkpoint. We'll direct our efforts there. You also reported: > I tried just now launching > bash and then running python (known to be okay in normal > circumstances) in the launched shell, and > then checkpointing, and the result was very strange. ... > [snip] > I think you may just not want to cover such strange uses > which can probably be worked around, which seems > reasonable. That's a fair assessment of our past history, when we were trying to rapidly absorb new use cases into DMTCP. Now that DMTCP has quite good coverage, and is also robust for the common use cases reported back to us, we do need to go back and check those corner cases. Thanks again for all the testing and the great bug reports. Best, - Gene On Mon, Apr 27, 2015 at 02:25:32PM -0400, Calvin Ostrum wrote: > On Mon, Apr 27, 2015 at 4:52 AM, Gene Cooperman <g...@ccs.neu.edu> wrote: > > [Sorry forgot to copy to list] > > > Thanks for this feedback. Concerning rc3, there are still > > some issues with a 32-bit O/S. You wrote that you were testing > > Fedora 19 on the Atom N550. Is that running in 32-bit mode? > > Yes, 32bit. Since it is a netbook with an installed maximum of 2G > memory, I saw no decisive reason to use a 64bit O/S on it. > > > Also, just to confirm, I assume that you are interrupting a function > > prior to checkpointing, and then checkpointing and restarting. > > It's in this case that you see a segfault on restart. > > Is this correct? > > Based on your use of the word "interrupt", I may have mixed > together the descriptions of the three separate > problems I have seen. > > When I get the segfault (thats two of the cases) I have not > interrupted with control-c anything that is running. I just checkpoint > the language interpreter shell at its command prompt (perhaps > that is what you mean by "interrupt" above, without control-c) and > then attempt to restart the checkpoint. Then I get the segfault > in 32 bit with rc2 and rc3, which as you say below is known about, > and also, occasionally with some/most checkpointed processes > on the 64bit system with rc1,2,3. > > The third case, there is no segfault, these are where I interrupt with > control-C. Everything works fine, but when I hit control-C in > the restarted process, it quits the process after passing the > interrupt to the checkpointed language interpreter which prints > out its usual message. Normally the language interpreters > (including ocaml) do not quit there, but in rc1 on the netbook, > (the only one that works at all there) ocaml does quit whereas > uncheckpointed it does not (the correct behavior). > > However, I mentioned that running ocaml happens by loading > in bytecode to a bytecode interpreter, by invoking a file > with the bytecode and a shell #! line. For my purposes, I > find it works (that is, I can interrupt with control-C the > checkpointed interpreter without it quitting) if I launch > the interpreter directly instead of the bytecode with > the shell #! line. So that is fine with me, but you may > be still consider it a problem that one cannot run bash, > which immediately calls a program, and the resulting > checkpointed process does not handle the interrupt > correctly. But it seems there would always be a way > around that for most users. > > > rc2 and rc3 have known bugs in terms of supporting 32-bit mode. > > After rc1, we changed our restart algorithm a little. In the > > next few updates this week, we're hoping to fix the 32-bit mode. > > So my own observations do fit the known reality in this > case which is good. I suppose 32-bit mode is not so > important to support anymore anyway. I just happen > to have this older netbook still. > > > Thank you for the further details. We'll especially > > look into why ocaml should be more sensitive than R/python. > > I guess the issue here might be running a shell first and > then the language interpreter within it. I tried just now launching > bash and then running python (known to be okay in normal > circumstances) in the launched shell, and > then checkpointing, and the result was very strange. The > restarted checkpoint gave the python prompt, but then > seemed unresponsive and the python process was stopped, > as if control-Z was typed. > > I think you may just not want to cover such strange uses > which can probably be worked around, which seems > reasonable. > > ------------------------------------------------------------------------------ > One dashboard for servers and applications across Physical-Virtual-Cloud > Widest out-of-the-box monitoring support with 50+ applications > Performance metrics, stats and reports that give you Actionable Insights > Deep dive visibility with transaction tracing using APM Insight. > http://ad.doubleclick.net/ddm/clk/290420510;117567292;y > _______________________________________________ > Dmtcp-forum mailing list > Dmtcp-forum@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/dmtcp-forum ------------------------------------------------------------------------------ One dashboard for servers and applications across Physical-Virtual-Cloud Widest out-of-the-box monitoring support with 50+ applications Performance metrics, stats and reports that give you Actionable Insights Deep dive visibility with transaction tracing using APM Insight. http://ad.doubleclick.net/ddm/clk/290420510;117567292;y _______________________________________________ Dmtcp-forum mailing list Dmtcp-forum@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dmtcp-forum