Just replied to your other email before seeing this. Take a look at those comments and let me know if that helps differentiate those interfaces.
On Tue, Feb 18, 2014 at 5:28 AM, Jeff Squyres (jsquyres) <jsquy...@cisco.com > wrote: > opal_crs.checkpoint() is not used to restart the process, but it does > return in two different cases: > > - in the "continue" case, opal_crs.checkpoint() returns in the original > process and keeps executing the same process and then, IIRC, invokes > opal_crs.continue(). > > - in the "restart" case, opal_crs.checkpoint() returns into a new process > and then, IIRC, invokes opal_crs.restart(). > > > On Feb 18, 2014, at 5:29 AM, Adrian Reber <adr...@lisas.de> wrote: > > > I should have read this email before answering the other. > > > > So opal_crs.checkpoint() is used to checkpoint the process as well as > > restart the process? I would have expected opal_crs.restart() is used > > for restart. I am confused. Looking at CRS/BLCR checkpoint() seems to > > only checkpoint and restart() seems to only restart. The comment in > > opal/mca/crs/crs.h says the same as you say. > > > > > > On Mon, Feb 17, 2014 at 03:43:08PM -0600, Josh Hursey wrote: > >> These values indicate the current state of the checkpointing lifecycle. > In > >> particular CONTINUE/RESTART are set by the checkpointer in the CRS (all > >> others are used by the INC mechanism). In the opal_crs.checkpoint() call > >> the checkpointer will capture the program state and it is possible to > >> emerge from this function in one of two scenarios. Either we are > continuing > >> execution in the original process (Continue state), or we are resuming > >> execution from a checkpointed state (Restart state). > >> > >> So if the checkpoint was successful, and you are not restarting the > process > >> then you want OPAL_CRS_CONTINUE. > >> > >> If the process is being restarted from a checkpoint file, then we should > >> emerge from this function setting the state to OPAL_CRS_RESTART. > >> > >> The OPAL_CR_CHECKPOINT state is used in the INC mechanism to notify all > of > >> the components to prepare for checkpoint (we probably should have > called it > >> OPAL_CR_PREPARE_FOR_CKPT). So not really used by the CRS mechanisms at > all. > >> You can see it used in the opal_cr_inc_core_prep() function in > >> opal/runtime/opal_cr.c > >> > >> -- Josh > >> > >> > >> > >> On Mon, Feb 17, 2014 at 9:28 AM, Adrian Reber <adr...@lisas.de> wrote: > >> > >>> This is probably for Josh. What is the meaning of the OPAL_CRS_* enums? > >>> > >>> They are probably used to communicate the state of the CRS modules. > >>> OPAL_CRS_ERROR seems to be used in case an error happened. What is the > >>> CRS module supposed to set this to if the checkpoint was successful. > >>> > >>> OPAL_CRS_CONTINUE or OPAL_CRS_CHECKPOINT? > >>> > >>> Adrian > >>> _______________________________________________ > >>> devel mailing list > >>> de...@open-mpi.org > >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel > >>> > >> > >> > >> > >> -- > >> Joshua Hursey > >> Assistant Professor of Computer Science > >> University of Wisconsin-La Crosse > >> http://cs.uwlax.edu/~jjhursey > > > >> _______________________________________________ > >> devel mailing list > >> de...@open-mpi.org > >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > > _______________________________________________ > > devel mailing list > > de...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel > -- Joshua Hursey Assistant Professor of Computer Science University of Wisconsin-La Crosse http://cs.uwlax.edu/~jjhursey