By default there's no such option. All the processes in a distributed computation must be checkpointed and restarted together for consistency. However, DMTCP could be easily extended to support checkpointing of processes selectively. The responsibility of ensuring the consistency would be on the application in this case. This entails checkpointing of any in-flight data between the checkpointed processes and the "external" processes, and ensuring that the checkpointed processes are not dependent on the state of external processes. If the checkpointed processes depend on the state of the external processes, then the application must bring the external processes to the expected state prior to restarting the checkpointed processes. (Note that these things are taken care of by DMTCP automatically in the default mode.)
Assuming that your application can take care of everything else, the changes to the DMTCP infrastructure are not that extensive. The basic idea is to exec your program under `dmtcp_nocheckpoint`. Any program that you wish to run without DMTCP can be run as follows: $ dmtcp_nocheckpoint a.out This isn't useful by default because you could just as easily run your program directly without DMTCP. :-) However, when you have a program that's already running under DMTCP, any program that you fork/exec would run DMTCP unless dmtcp_nocheckpoint was specified. Next, you could write a DMTCP plugin that checks for the current MPI rank at init, and, if the current process doesn't need to be checkpointed, execs into "dmtcp_nocheckpoint <mpi-program>". There's a plugin tutorial in the doc subdirectory; you could also look at other example plugins for reference. Another possibility is to modify the MPI process manager to launch only the required MPI ranks under DMTCP. In this case, you don't need to make any modifications to DMTCP. DMTCP would only checkpoint the processes running under its control, but you still need to ensure consistency between checkpointed and external processes as mentioned before. On Thu, Aug 25, 2016 at 09:51:36AM -0400, Rodrigo Porfírio da Silva Sacchi wrote: > Dears DMTCP developers, > > I'm a new dmtcp user and I'm using a MPI application with 4 process. > However, I wish save checkpoint only 2 or 3 from theses processes. Or even, > I want to save the checkpoints asynchronously. Is it possible to do this > with DMTCP? If Yes, how? > > Else, how could I change DMTCP? How can you advise me? > > Best Regards, > -- > ---------------------------------------------------------- > Rodrigo Sacchi > Brazil > ------------------------------------------------------------------------------ > _______________________________________________ > Dmtcp-forum mailing list > Dmtcp-forum@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/dmtcp-forum ------------------------------------------------------------------------------ _______________________________________________ Dmtcp-forum mailing list Dmtcp-forum@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dmtcp-forum