Hi Jiajun, all,
Sorry if I didn't explain myself clearly. What you are explaining is great. I am already using it, successfully checkpointing both serial and MPI applications. My idea is however going a step further. Slurm configuration ( https://computing.llnl.gov/linux/slurm/slurm.conf.html , parameter CheckpointType) allows to specify a checkpointing mechanism. This can be employed by Slurm commands and Slurm API to perform checkpoint-based operations, such making a checkpoint or restarting it. It is also an advantage to the final user, as he doesn't need to employ an specific script but standard Slurm commands to submit and checkpoint his jobs. Looking at the Slurm source code and documentation, I got the impression that DMTCP was not supported out-of-the-box. So my question is whether you know how to integrate both this way. Best regards, Manuel 2015-05-21 23:50 GMT+02:00 Jiajun Cao <jia...@ccs.neu.edu>: > Hi Manuel, > > If I understand it correctly, what you want is to run your application > with DMTCP support under Slurm. Is that right? > If so, we already have the plugin for the support of Slurm. The source code > is in the plugin/batch-queue directory, and it is > compiled by default. To enable it, try dmtcp_launch with the --rm option. In > the plugin/batch-queue/job_examples directory, > there are some submission scripts we provide for running jobs under Slurm. > > Hope that helps. Let me know if you have any further questions. > > Best, > Jiajun > > On Thu, May 21, 2015 at 11:04 AM, Manuel Rodríguez Pascual > <manuel.rodriguez.pasc...@gmail.com> wrote: >> >> Hi all, >> >> I am (unsuccessfully) trying to integrate DMTCP with Slurm. >> >> On the first step, employing the scripts provided with DMTCP, I have >> succeeded and it is now working, both with serial tasks and MPICH3 >> ones. This is great news :) >> >> However, I would now like to employ this library from Slurm API. To do >> so, I guess I'll have to integrate DMTCP as a plugin, and then specify >> it in slurm.conf (variable "CheckpointType=checkpoint/XXXX". Is this >> possible? I have looked inside Slurm code and doesn't seem to have >> support out of the box, but I was imagining that maybe you have >> provided it some way or another. >> >> Thanks for your help, >> >> >> Manuel >> >> >> -- >> Dr. Manuel Rodríguez-Pascual >> skype: manuel.rodriguez.pascual >> phone: (+34) 913466173 // (+34) 679925108 >> >> CIEMAT-Moncloa >> Edificio 22, desp. 1.25 >> Avenida Complutense, 40 >> 28040- MADRID >> SPAIN >> >> >> ------------------------------------------------------------------------------ >> One dashboard for servers and applications across Physical-Virtual-Cloud >> Widest out-of-the-box monitoring support with 50+ applications >> Performance metrics, stats and reports that give you Actionable Insights >> Deep dive visibility with transaction tracing using APM Insight. >> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y >> _______________________________________________ >> Dmtcp-forum mailing list >> Dmtcp-forum@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/dmtcp-forum > > -- Dr. Manuel Rodríguez-Pascual skype: manuel.rodriguez.pascual phone: (+34) 913466173 // (+34) 679925108 CIEMAT-Moncloa Edificio 22, desp. 1.25 Avenida Complutense, 40 28040- MADRID SPAIN ------------------------------------------------------------------------------ One dashboard for servers and applications across Physical-Virtual-Cloud Widest out-of-the-box monitoring support with 50+ applications Performance metrics, stats and reports that give you Actionable Insights Deep dive visibility with transaction tracing using APM Insight. http://ad.doubleclick.net/ddm/clk/290420510;117567292;y _______________________________________________ Dmtcp-forum mailing list Dmtcp-forum@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dmtcp-forum