By default there's no such option. All the processes in a distributed
computation must be checkpointed and restarted together for consistency.
However, DMTCP could be easily extended to support checkpointing of
processes selectively. The responsibility of ensuring the consistency
would be on the application in this case. This entails checkpointing of
any in-flight data between the checkpointed processes and the "external"
processes, and ensuring that the checkpointed processes are not dependent
on the state of external processes. If the checkpointed processes depend
on the state of the external processes, then the application must bring
the external processes to the expected state prior to restarting the
checkpointed processes. (Note that these things are taken care of by
DMTCP automatically in the default mode.)

Assuming that your application can take care of everything else, the
changes to the DMTCP infrastructure are not that extensive. The basic
idea is to exec your program under `dmtcp_nocheckpoint`. Any program
that you wish to run without DMTCP can be run as follows:

  $ dmtcp_nocheckpoint a.out

This isn't useful by default because you could just as easily run your
program directly without DMTCP. :-) However, when you have a program
that's already running under DMTCP, any program that you fork/exec would
run DMTCP unless dmtcp_nocheckpoint was specified.

Next, you could write a DMTCP plugin that checks for the current MPI rank
at init, and, if the current process doesn't need to be checkpointed,
execs into "dmtcp_nocheckpoint <mpi-program>". There's a plugin tutorial
in the doc subdirectory; you could also look at other example plugins
for reference.

Another possibility is to modify the MPI process manager to launch only
the required MPI ranks under DMTCP. In this case, you don't need to make
any modifications to DMTCP. DMTCP would only checkpoint the processes
running under its control, but you still need to ensure consistency
between checkpointed and external processes as mentioned before.

On Thu, Aug 25, 2016 at 09:51:36AM -0400, Rodrigo Porfírio da Silva Sacchi 
wrote:
> Dears DMTCP developers,
> 
> I'm a new dmtcp user and I'm using a MPI application with 4 process.
> However, I wish save checkpoint only 2 or 3 from theses processes. Or even,
> I want to save the checkpoints asynchronously. Is it possible to do this
> with DMTCP? If Yes, how?
> 
> Else, how could I change DMTCP? How can you advise me?
> 
> Best Regards,
> -- 
> ----------------------------------------------------------
> Rodrigo Sacchi
> Brazil

> ------------------------------------------------------------------------------

> _______________________________________________
> Dmtcp-forum mailing list
> Dmtcp-forum@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/dmtcp-forum


------------------------------------------------------------------------------
_______________________________________________
Dmtcp-forum mailing list
Dmtcp-forum@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dmtcp-forum

Reply via email to