Hi Jiajun, all,

Sorry if I didn't explain myself clearly.

What you are explaining is great. I am already using it, successfully
checkpointing both serial and MPI applications. My idea is however
going a step further.

Slurm configuration (
https://computing.llnl.gov/linux/slurm/slurm.conf.html , parameter
CheckpointType) allows to specify a checkpointing mechanism. This can
be employed by Slurm commands and Slurm API to perform
checkpoint-based operations, such making a checkpoint or restarting
it. It is also an advantage to the final user, as he doesn't need to
employ an specific script but standard Slurm commands to submit and
checkpoint his jobs.

Looking  at the Slurm source code and documentation, I got the
impression that DMTCP was not supported out-of-the-box. So my question
is whether you know how to integrate both this way.

Best regards,


Manuel




2015-05-21 23:50 GMT+02:00 Jiajun Cao <jia...@ccs.neu.edu>:
> Hi Manuel,
>
>   If I understand it correctly, what you want is to run your application
> with DMTCP support under Slurm. Is that right?
> If so, we already have the plugin for the support of Slurm. The source code
> is in the plugin/batch-queue directory, and it is
> compiled by default. To enable it, try dmtcp_launch with the --rm option. In
> the plugin/batch-queue/job_examples directory,
> there are some submission scripts we provide for running jobs under Slurm.
>
>   Hope that helps. Let me know if you have any further questions.
>
> Best,
> Jiajun
>
> On Thu, May 21, 2015 at 11:04 AM, Manuel Rodríguez Pascual
> <manuel.rodriguez.pasc...@gmail.com> wrote:
>>
>> Hi all,
>>
>> I am (unsuccessfully) trying to integrate DMTCP with Slurm.
>>
>> On the first step, employing the scripts provided with DMTCP, I have
>> succeeded and it is now working, both with serial tasks and MPICH3
>> ones. This is great news :)
>>
>> However, I would now like to employ this library from Slurm API. To do
>> so, I guess I'll have to integrate DMTCP as a plugin, and then specify
>> it in slurm.conf (variable "CheckpointType=checkpoint/XXXX". Is this
>> possible? I have looked inside Slurm code and doesn't seem to have
>> support out of the box, but I was imagining that maybe you have
>> provided it some way or another.
>>
>> Thanks for your help,
>>
>>
>> Manuel
>>
>>
>> --
>> Dr. Manuel Rodríguez-Pascual
>> skype: manuel.rodriguez.pascual
>> phone: (+34) 913466173 // (+34) 679925108
>>
>> CIEMAT-Moncloa
>> Edificio 22, desp. 1.25
>> Avenida Complutense, 40
>> 28040- MADRID
>> SPAIN
>>
>>
>> ------------------------------------------------------------------------------
>> One dashboard for servers and applications across Physical-Virtual-Cloud
>> Widest out-of-the-box monitoring support with 50+ applications
>> Performance metrics, stats and reports that give you Actionable Insights
>> Deep dive visibility with transaction tracing using APM Insight.
>> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
>> _______________________________________________
>> Dmtcp-forum mailing list
>> Dmtcp-forum@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/dmtcp-forum
>
>



-- 
Dr. Manuel Rodríguez-Pascual
skype: manuel.rodriguez.pascual
phone: (+34) 913466173 // (+34) 679925108

CIEMAT-Moncloa
Edificio 22, desp. 1.25
Avenida Complutense, 40
28040- MADRID
SPAIN

------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud 
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
Dmtcp-forum mailing list
Dmtcp-forum@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dmtcp-forum

Reply via email to