I think I do not understand your question. So far I have only implemented the
checkpoint part and not the restart part.

Using criu_dump() the process can  be left in three different
states. Without any special handling the process is dumped and then
killed. I can also tell criu to leave the process stopped (--leave-stopped)
or running (--leave-running). I decided to default to --leave-running so
that after the checkpoint has been performed the process continues
running where it stopped.

What would be the difference between 'being restarted versus continuing
after checkpointing'? Right now only 'continuing after checkpoint' is
implemented. I do not understand how process 'is being restarted' fits
in the checkpoint function.

In opal_crs_criu_checkpoint() I am using criu_dump() to
checkpoint the process and the plan is to use criu_restore() in
opal_crs_criu_restart() (which I have not yet implemented).

On Mon, Feb 17, 2014 at 03:45:49PM -0600, Josh Hursey wrote:
> It look fine except that the restart state is not flagged. When a process
> is restarted does it resume execution inside the criu_dump() function? If
> so, is there a way to tell from its return code (or some other mechanism)
> that it is being restarted versus continuing after checkpointing?
> 
> 
> On Mon, Feb 17, 2014 at 2:00 PM, Ralph Castain <[email protected]> wrote:
> 
> > Great - looks fine to me!!
> >
> >
> > On Feb 17, 2014, at 11:39 AM, Adrian Reber <[email protected]> wrote:
> >
> > > I have prepared a patch I would like to commit which adds to code to
> > > actually checkpoint a process. Thanks for the pointers about the string
> > > variables I tried to do implement it correctly.
> > >
> > > CRIU currently has problems with the new OOB usock but I will contact
> > > the CRIU developers about this error. Using tcp, checkpointing works.
> > >
> > > CRIU also has problems with --np > 1, but I am sure this can also be
> > > resolved.
> > >
> > > The patch is at:
> > >
> > >
> > https://lisas.de/git/?p=open-mpi.git;a=commitdiff;h=89c9c27c87598706e8f798f84fe9520ee5884492
> > >
> > >               Adrian
> > > _______________________________________________
> > > devel mailing list
> > > [email protected]
> > > http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >
> > _______________________________________________
> > devel mailing list
> > [email protected]
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >
> 
> 
> 
> -- 
> Joshua Hursey
> Assistant Professor of Computer Science
> University of Wisconsin-La Crosse
> http://cs.uwlax.edu/~jjhursey

> _______________________________________________
> devel mailing list
> [email protected]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

Reply via email to