On Mon, Apr 27, 2015 at 4:52 AM, Gene Cooperman <g...@ccs.neu.edu> wrote:
[Sorry forgot to copy to list] > Thanks for this feedback. Concerning rc3, there are still > some issues with a 32-bit O/S. You wrote that you were testing > Fedora 19 on the Atom N550. Is that running in 32-bit mode? Yes, 32bit. Since it is a netbook with an installed maximum of 2G memory, I saw no decisive reason to use a 64bit O/S on it. > Also, just to confirm, I assume that you are interrupting a function > prior to checkpointing, and then checkpointing and restarting. > It's in this case that you see a segfault on restart. > Is this correct? Based on your use of the word "interrupt", I may have mixed together the descriptions of the three separate problems I have seen. When I get the segfault (thats two of the cases) I have not interrupted with control-c anything that is running. I just checkpoint the language interpreter shell at its command prompt (perhaps that is what you mean by "interrupt" above, without control-c) and then attempt to restart the checkpoint. Then I get the segfault in 32 bit with rc2 and rc3, which as you say below is known about, and also, occasionally with some/most checkpointed processes on the 64bit system with rc1,2,3. The third case, there is no segfault, these are where I interrupt with control-C. Everything works fine, but when I hit control-C in the restarted process, it quits the process after passing the interrupt to the checkpointed language interpreter which prints out its usual message. Normally the language interpreters (including ocaml) do not quit there, but in rc1 on the netbook, (the only one that works at all there) ocaml does quit whereas uncheckpointed it does not (the correct behavior). However, I mentioned that running ocaml happens by loading in bytecode to a bytecode interpreter, by invoking a file with the bytecode and a shell #! line. For my purposes, I find it works (that is, I can interrupt with control-C the checkpointed interpreter without it quitting) if I launch the interpreter directly instead of the bytecode with the shell #! line. So that is fine with me, but you may be still consider it a problem that one cannot run bash, which immediately calls a program, and the resulting checkpointed process does not handle the interrupt correctly. But it seems there would always be a way around that for most users. > rc2 and rc3 have known bugs in terms of supporting 32-bit mode. > After rc1, we changed our restart algorithm a little. In the > next few updates this week, we're hoping to fix the 32-bit mode. So my own observations do fit the known reality in this case which is good. I suppose 32-bit mode is not so important to support anymore anyway. I just happen to have this older netbook still. > Thank you for the further details. We'll especially > look into why ocaml should be more sensitive than R/python. I guess the issue here might be running a shell first and then the language interpreter within it. I tried just now launching bash and then running python (known to be okay in normal circumstances) in the launched shell, and then checkpointing, and the result was very strange. The restarted checkpoint gave the python prompt, but then seemed unresponsive and the python process was stopped, as if control-Z was typed. I think you may just not want to cover such strange uses which can probably be worked around, which seems reasonable. ------------------------------------------------------------------------------ One dashboard for servers and applications across Physical-Virtual-Cloud Widest out-of-the-box monitoring support with 50+ applications Performance metrics, stats and reports that give you Actionable Insights Deep dive visibility with transaction tracing using APM Insight. http://ad.doubleclick.net/ddm/clk/290420510;117567292;y _______________________________________________ Dmtcp-forum mailing list Dmtcp-forum@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dmtcp-forum