On Mon, Apr 27, 2015 at 4:52 AM, Gene Cooperman <g...@ccs.neu.edu> wrote:

[Sorry forgot to copy to list]

>     Thanks for this feedback.  Concerning rc3, there are still
> some issues with a 32-bit O/S.  You wrote that you were testing
> Fedora 19 on the Atom N550.  Is that running in 32-bit mode?

Yes, 32bit.  Since it is a netbook with an installed maximum of 2G
memory, I saw no decisive reason to use a 64bit O/S on it.

> Also, just to confirm, I assume that you are interrupting a function
> prior to checkpointing, and then checkpointing and restarting.
> It's in this case that you see a segfault on restart.
> Is this correct?

Based on your use of the word "interrupt", I may have mixed
together the descriptions of the three separate
problems I have seen.

When I get the segfault (thats two of the cases) I have not
interrupted with control-c anything that is running.  I just checkpoint
the language interpreter shell at its command prompt (perhaps
that is what you mean by "interrupt" above, without control-c) and
then attempt to restart the checkpoint.   Then I get the segfault
in 32 bit with rc2 and rc3, which as you say below is known about,
and also, occasionally with some/most checkpointed processes
on the 64bit system with rc1,2,3.

The third case, there is no segfault, these are where I interrupt with
control-C.  Everything works fine, but when I hit control-C in
the restarted process, it quits the process after passing the
interrupt to the checkpointed language interpreter which prints
out its usual message.  Normally the language interpreters
(including ocaml)  do not quit there, but in rc1 on the netbook,
(the only one that works at all there) ocaml does quit whereas
uncheckpointed it does not (the correct behavior).

However, I mentioned that running ocaml happens by loading
in bytecode to a bytecode interpreter, by invoking a file
with the bytecode and a shell #! line.  For my purposes, I
find it works (that is, I can interrupt with control-C the
checkpointed interpreter without it quitting) if I launch
the interpreter directly instead of the bytecode with
the shell #! line.    So that is fine with me, but you may
be still consider it a problem that one cannot run bash,
which immediately calls a program, and the resulting
checkpointed process does not handle the interrupt
correctly.  But it seems there  would always be a way
around that for most users.

> rc2 and rc3 have known bugs in terms of supporting 32-bit mode.
> After rc1, we changed our restart algorithm a little.  In the
> next few updates this week, we're hoping to fix the 32-bit mode.

So my own observations do fit the known reality in this
case which is good.  I suppose 32-bit mode is not so
important to support anymore anyway.   I just happen
to have this older netbook still.

> Thank you for the further details.  We'll especially
> look into why ocaml should be more sensitive than R/python.

I guess the issue here might be running a shell first and
then the language interpreter within it.  I tried just now launching
bash and then running python (known to be okay in normal
circumstances) in the launched shell, and
then checkpointing, and the result was very strange.  The
restarted checkpoint gave the python prompt, but then
seemed unresponsive and the python process was stopped,
as if control-Z was typed.

I think you may just not want to cover such strange uses
which can probably be worked around, which seems
reasonable.

------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud 
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
Dmtcp-forum mailing list
Dmtcp-forum@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dmtcp-forum

Reply via email to