Hi Hussen,

just to let you know. As Jiajun said, I am currently working on a full
integration between DMTCP and Slurm. It should be ready in the short
term.

Best regards,


Manuel

2016-04-19 10:02 GMT+02:00 Husen R <hus...@gmail.com>:
> dear all,
>
> I have tried to run DMTCP with slurm but it seems the DMTCP checkpoint
> feature is not working.
> here is my slurm_launch.job
>
>
>  #!/bin/bash
> # Put your SLURM options here
> #SBATCH --time=00:30:00           # put proper time of reservation here
> #SBATCH -N 3                      # number of nodes
> #SBATCH -n 24                     # processes per node
> #SBATCH -J dmtcp_job              # change to your job name
> #SBATCH -o output/dmtcp-%j.out    # change to proper file name or remove for
> defaults
>
>
>
> start_coordinator()
> {
>     ############################################################
>     # For debugging when launching a custom coordinator, uncomment
>     # the following lines and provide the proper host and port for
>     # the coordinator.
>     ############################################################
>     #export DMTCP_COORD_HOST=$h
>     #export DMTCP_COORD_PORT=$p
>     #return
>
>     fname=dmtcp_command.$SLURM_JOBID
>     echo $fname
>     h=`hostname`
>
>     check_coordinator=`which dmtcp_coordinator`
>     if [ -z "$check_coordinator" ]; then
>         echo "No dmtcp_coordinator found. Check your DMTCP installation and
> PATH settings."
>         exit 0
>     fi
>
>     dmtcp_coordinator --daemon --exit-on-last -p 0 --port-file $fname $@
> 1>/dev/null 2>&1
>
>     while true; do
>         if [ -f "$fname" ]; then
>             p=`cat $fname`
>             if [ -n "$p" ]; then
>                 # try to communicate ? dmtcp_command -p $p l
>                 break
>             fi
>         fi
>     done
>
>     # Create dmtcp_command wrapper for easy communication with coordinator
>     p=`cat $fname`
>     chmod +x $fname
>     echo "#!/bin/bash" > $fname
>     echo >> $fname
>     echo "export PATH=$PATH" >> $fname
>     echo "export DMTCP_COORD_HOST=$h" >> $fname
>     echo "export DMTCP_COORD_PORT=$p" >> $fname
>     echo "dmtcp_command \$@" >> $fname
>
>     # Set up local environment for DMTCP
>     export DMTCP_COORD_HOST=$h
>     export DMTCP_COORD_PORT=$p
> }
>
> # changedir to workdir
> cd $SLURM_SUBMIT_DIR
>
> start_coordinator -i 10 --ckptdir jobckpt
> dmtcp_launch --rm mpiexec ./mm.o
>
> ###########################################END##################################
>
>
> there is only .sh file in jobckpt directory. No .dmtcp file in that
> directory.
>
> any idea how to solve this ?
>
> Regards,
>
>
> Husen
>
> On Tue, Apr 19, 2016 at 11:42 AM, Husen R <hus...@gmail.com> wrote:
>>
>> Dear all,
>>
>> Thank you for your reply.
>>
>> Currently I have found job_examples in DMTCP source code.
>> I tried to submit job using slurm_launch.job but it doesn't work.
>>
>> I will learn slurm_launch.job first..I'll ask you once the problem is not
>> resolved.
>>
>> Regards,
>>
>>
>> Husen
>>
>> On Tue, Apr 19, 2016 at 10:09 AM, Jiajun Cao <jia...@ccs.neu.edu> wrote:
>>>
>>> Hi Husen,
>>>
>>> Depending on your use cases, there're two ways to integrate DMTCP with
>>> Slurm:
>>>
>>> 1. Submitting Slurm job scripts using DMTCP: we already have the DMTCP
>>> plugin for Slurm, and if you download the source code of DMTCP, some example
>>> scripts can be found at:
>>>     plugin/batch-queue/job_examples
>>>
>>> 2. There's also a Slurm developer who has been working on integrate DMTCP
>>> into Slurm, here is the github page:
>>>     https://github.com/supermanue/slurm/tree/dmtcp_plugin
>>>
>>> Let us know if you have any other questions,
>>> Jiajun
>>>
>>> On Mon, Apr 18, 2016 at 10:36 PM, Husen R <hus...@gmail.com> wrote:
>>>>
>>>> Hi all,
>>>>
>>>> is there a way to integrate DMTCP with Slurm resource manager ?
>>>> Thank you in advance
>>>>
>>>>
>>>> regards,
>>>>
>>>>
>>>> Husen
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>> Find and fix application performance issues faster with Applications
>>>> Manager
>>>> Applications Manager provides deep performance insights into multiple
>>>> tiers of
>>>> your business applications. It resolves application problems quickly and
>>>> reduces your MTTR. Get your free trial!
>>>> https://ad.doubleclick.net/ddm/clk/302982198;130105516;z
>>>> _______________________________________________
>>>> Dmtcp-forum mailing list
>>>> Dmtcp-forum@lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/dmtcp-forum
>>>>
>>>
>>
>
>
> ------------------------------------------------------------------------------
> Find and fix application performance issues faster with Applications Manager
> Applications Manager provides deep performance insights into multiple tiers
> of
> your business applications. It resolves application problems quickly and
> reduces your MTTR. Get your free trial!
> https://ad.doubleclick.net/ddm/clk/302982198;130105516;z
> _______________________________________________
> Dmtcp-forum mailing list
> Dmtcp-forum@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/dmtcp-forum
>

------------------------------------------------------------------------------
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z
_______________________________________________
Dmtcp-forum mailing list
Dmtcp-forum@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dmtcp-forum

Reply via email to