At least for me ( I am not a developer for dmtcp) I was forced to switch to
openmpi (version1.6 specifically) in order to get --rm to work correctly.
What version of mpi are you running? In addition, if you are using
infiniband, --ib will need to be installed and utilized in order to
accomplish a restart.

On Wed, May 18, 2016 at 1:15 AM, Husen R <hus...@gmail.com> wrote:

> dear all,
>
> I have tried to checkpoint mpi application using dmtcp but I failed with
> the error message as follows :
>
>
> [40000] WARNING at kernelbufferdrainer.cpp:124 in onTimeoutInterval;
> REASON='JWARNING(false) failed'
>      _dataSockets[i]->socket().sockfd() = 9
>      buffer.size() = 0
>      WARN_INTERVAL_SEC = 10
> Message: Still draining socket... perhaps remote host is not running under
> DMTCP?
> [40000] WARNING at kernelbufferdrainer.cpp:124 in onTimeoutInterval;
> REASON='JWARNING(false) failed'
>      _dataSockets[i]->socket().sockfd() = 7
>      buffer.size() = 0
>      WARN_INTERVAL_SEC = 10
> Message: Still draining socket... perhaps remote host is not running under
> DMTCP?
> ......
> ......
> ......
>
> I use this sbatch script to submit job :
>
> #####################################SBATCH###########################
> #!/bin/bash
> # Put your SLURM options here
> #SBATCH --partition=comeon
> #SBATCH --time=01:15:00
> #SBATCH --nodes=2
> #SBATCH --ntasks-per-node=4
> #SBATCH --job-name="dmtcp_job"
> #SBATCH --output=dmtcp_ckpt_img/dmtcp-%j.out
>
> start_coordinator()
> {
>
>     fname=dmtcp_command.$SLURM_JOBID
>     h=$(hostname)
>     check_coordinator=$(which dmtcp_coordinator)
>
>     if [ -z "$check_coordinator" ]; then
>         echo "No dmtcp_coordinator found. Check your DMTCP installation
> and PATH settings."
>         exit 0
>     fi
>
>     dmtcp_coordinator --daemon --exit-on-last -p 0 --port-file $fname $@
> 1>/dev/null 2>&1
>
>     p=`cat $fname`
>     chmod +x $fname
>     echo "#!/bin/bash" > $fname
>     echo >> $fname
>     echo "export PATH=$PATH" >> $fname
>     echo "export DMTCP_COORD_HOST=$h" >> $fname
>     echo "export DMTCP_COORD_PORT=$p" >> $fname
>     echo "dmtcp_command \$@" >> $fname
>
>     # Set up local environment for DMTCP
>     export DMTCP_COORD_HOST=$h
>     export DMTCP_COORD_PORT=$p
> }
>
> cd $SLURM_SUBMIT_DIR
> start_coordinator -i 240
> dmtcp_launch -h $h -p $p mpiexec ./mm.o
>
> #########################################################################
>
> I also have tried using --rm option in dmtcp_launch but it doesn't work
> and no output at all.
>
> anybody tell me how to solve this please ? I need help
>
>
> Regards,
>
>
>
> Husen
>
>
> ------------------------------------------------------------------------------
> Mobile security can be enabling, not merely restricting. Employees who
> bring their own devices (BYOD) to work are irked by the imposition of MDM
> restrictions. Mobile Device Manager Plus allows you to control only the
> apps on BYO-devices by containerizing them, leaving personal data
> untouched!
> https://ad.doubleclick.net/ddm/clk/304595813;131938128;j
> _______________________________________________
> Dmtcp-forum mailing list
> Dmtcp-forum@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/dmtcp-forum
>
>


-- 
William Fox

Lawrence Berkeley National Laboratory
Computational Research Division
------------------------------------------------------------------------------
Mobile security can be enabling, not merely restricting. Employees who
bring their own devices (BYOD) to work are irked by the imposition of MDM
restrictions. Mobile Device Manager Plus allows you to control only the
apps on BYO-devices by containerizing them, leaving personal data untouched!
https://ad.doubleclick.net/ddm/clk/304595813;131938128;j
_______________________________________________
Dmtcp-forum mailing list
Dmtcp-forum@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dmtcp-forum

Reply via email to