dear all,
I have tried to run DMTCP with slurm but it seems the DMTCP checkpoint
feature is not working.
here is my slurm_launch.job
#!/bin/bash
# Put your SLURM options here
#SBATCH --time=00:30:00 # put proper time of reservation here
#SBATCH -N 3 # number of nodes
#SBATCH -n 24 # processes per node
#SBATCH -J dmtcp_job # change to your job name
#SBATCH -o output/dmtcp-%j.out # change to proper file name or remove
for defaults
start_coordinator()
{
############################################################
# For debugging when launching a custom coordinator, uncomment
# the following lines and provide the proper host and port for
# the coordinator.
############################################################
#export DMTCP_COORD_HOST=$h
#export DMTCP_COORD_PORT=$p
#return
fname=dmtcp_command.$SLURM_JOBID
echo $fname
h=`hostname`
check_coordinator=`which dmtcp_coordinator`
if [ -z "$check_coordinator" ]; then
echo "No dmtcp_coordinator found. Check your DMTCP installation and
PATH settings."
exit 0
fi
dmtcp_coordinator --daemon --exit-on-last -p 0 --port-file $fname $@
1>/dev/null 2>&1
while true; do
if [ -f "$fname" ]; then
p=`cat $fname`
if [ -n "$p" ]; then
# try to communicate ? dmtcp_command -p $p l
break
fi
fi
done
# Create dmtcp_command wrapper for easy communication with coordinator
p=`cat $fname`
chmod +x $fname
echo "#!/bin/bash" > $fname
echo >> $fname
echo "export PATH=$PATH" >> $fname
echo "export DMTCP_COORD_HOST=$h" >> $fname
echo "export DMTCP_COORD_PORT=$p" >> $fname
echo "dmtcp_command \$@" >> $fname
# Set up local environment for DMTCP
export DMTCP_COORD_HOST=$h
export DMTCP_COORD_PORT=$p
}
# changedir to workdir
cd $SLURM_SUBMIT_DIR
start_coordinator -i 10 --ckptdir jobckpt
dmtcp_launch --rm mpiexec ./mm.o
###########################################END##################################
there is only .sh file in jobckpt directory. No .dmtcp file in that
directory.
any idea how to solve this ?
Regards,
Husen
On Tue, Apr 19, 2016 at 11:42 AM, Husen R <hus...@gmail.com> wrote:
> Dear all,
>
> Thank you for your reply.
>
> Currently I have found job_examples in DMTCP source code.
> I tried to submit job using slurm_launch.job but it doesn't work.
>
> I will learn slurm_launch.job first..I'll ask you once the problem is not
> resolved.
>
> Regards,
>
>
> Husen
>
> On Tue, Apr 19, 2016 at 10:09 AM, Jiajun Cao <jia...@ccs.neu.edu> wrote:
>
>> Hi Husen,
>>
>> Depending on your use cases, there're two ways to integrate DMTCP with
>> Slurm:
>>
>> 1. Submitting Slurm job scripts using DMTCP: we already have the DMTCP
>> plugin for Slurm, and if you download the source code of DMTCP, some
>> example scripts can be found at:
>> plugin/batch-queue/job_examples
>>
>> 2. There's also a Slurm developer who has been working on integrate DMTCP
>> into Slurm, here is the github page:
>> https://github.com/supermanue/slurm/tree/dmtcp_plugin
>>
>> Let us know if you have any other questions,
>> Jiajun
>>
>> On Mon, Apr 18, 2016 at 10:36 PM, Husen R <hus...@gmail.com> wrote:
>>
>>> Hi all,
>>>
>>> is there a way to integrate DMTCP with Slurm resource manager ?
>>> Thank you in advance
>>>
>>>
>>> regards,
>>>
>>>
>>> Husen
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Find and fix application performance issues faster with Applications
>>> Manager
>>> Applications Manager provides deep performance insights into multiple
>>> tiers of
>>> your business applications. It resolves application problems quickly and
>>> reduces your MTTR. Get your free trial!
>>> https://ad.doubleclick.net/ddm/clk/302982198;130105516;z
>>> _______________________________________________
>>> Dmtcp-forum mailing list
>>> Dmtcp-forum@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/dmtcp-forum
>>>
>>>
>>
>
------------------------------------------------------------------------------
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z
_______________________________________________
Dmtcp-forum mailing list
Dmtcp-forum@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dmtcp-forum