Hi William and Husen, As far as I know, the combination "--rm --ib" should work with the major MPI implementations: Open MPI, MVAPICH2, Intel MPI, MPICH2. But I'm not sure which ones we've tested with very recently. I'm pretty sure that we've used MVAPICH2 and Open MPI in this way.
Jiajun and Rohan, Could you confirm which implementations you've used _with the "--rm --ib" combination_? If it's not working with one of the major MPI implementations, we need to fix that. Thanks, - Gene On Thu, May 19, 2016 at 03:42:06PM -0700, William Fox wrote: > At least for me ( I am not a developer for dmtcp) I was forced to switch to > openmpi (version1.6 specifically) in order to get --rm to work correctly. > What version of mpi are you running? In addition, if you are using > infiniband, --ib will need to be installed and utilized in order to > accomplish a restart. > > On Wed, May 18, 2016 at 1:15 AM, Husen R <hus...@gmail.com> wrote: > > > dear all, > > > > I have tried to checkpoint mpi application using dmtcp but I failed with > > the error message as follows : > > > > > > [40000] WARNING at kernelbufferdrainer.cpp:124 in onTimeoutInterval; > > REASON='JWARNING(false) failed' > > _dataSockets[i]->socket().sockfd() = 9 > > buffer.size() = 0 > > WARN_INTERVAL_SEC = 10 > > Message: Still draining socket... perhaps remote host is not running under > > DMTCP? > > [40000] WARNING at kernelbufferdrainer.cpp:124 in onTimeoutInterval; > > REASON='JWARNING(false) failed' > > _dataSockets[i]->socket().sockfd() = 7 > > buffer.size() = 0 > > WARN_INTERVAL_SEC = 10 > > Message: Still draining socket... perhaps remote host is not running under > > DMTCP? > > ...... > > ...... > > ...... > > > > I use this sbatch script to submit job : > > > > #####################################SBATCH########################### > > #!/bin/bash > > # Put your SLURM options here > > #SBATCH --partition=comeon > > #SBATCH --time=01:15:00 > > #SBATCH --nodes=2 > > #SBATCH --ntasks-per-node=4 > > #SBATCH --job-name="dmtcp_job" > > #SBATCH --output=dmtcp_ckpt_img/dmtcp-%j.out > > > > start_coordinator() > > { > > > > fname=dmtcp_command.$SLURM_JOBID > > h=$(hostname) > > check_coordinator=$(which dmtcp_coordinator) > > > > if [ -z "$check_coordinator" ]; then > > echo "No dmtcp_coordinator found. Check your DMTCP installation > > and PATH settings." > > exit 0 > > fi > > > > dmtcp_coordinator --daemon --exit-on-last -p 0 --port-file $fname $@ > > 1>/dev/null 2>&1 > > > > p=`cat $fname` > > chmod +x $fname > > echo "#!/bin/bash" > $fname > > echo >> $fname > > echo "export PATH=$PATH" >> $fname > > echo "export DMTCP_COORD_HOST=$h" >> $fname > > echo "export DMTCP_COORD_PORT=$p" >> $fname > > echo "dmtcp_command \$@" >> $fname > > > > # Set up local environment for DMTCP > > export DMTCP_COORD_HOST=$h > > export DMTCP_COORD_PORT=$p > > } > > > > cd $SLURM_SUBMIT_DIR > > start_coordinator -i 240 > > dmtcp_launch -h $h -p $p mpiexec ./mm.o > > > > ######################################################################### > > > > I also have tried using --rm option in dmtcp_launch but it doesn't work > > and no output at all. > > > > anybody tell me how to solve this please ? I need help > > > > > > Regards, > > > > > > > > Husen > > > > > > ------------------------------------------------------------------------------ > > Mobile security can be enabling, not merely restricting. Employees who > > bring their own devices (BYOD) to work are irked by the imposition of MDM > > restrictions. Mobile Device Manager Plus allows you to control only the > > apps on BYO-devices by containerizing them, leaving personal data > > untouched! > > https://ad.doubleclick.net/ddm/clk/304595813;131938128;j > > _______________________________________________ > > Dmtcp-forum mailing list > > Dmtcp-forum@lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/dmtcp-forum > > > > > > > -- > William Fox > > Lawrence Berkeley National Laboratory > Computational Research Division > ------------------------------------------------------------------------------ > Mobile security can be enabling, not merely restricting. Employees who > bring their own devices (BYOD) to work are irked by the imposition of MDM > restrictions. Mobile Device Manager Plus allows you to control only the > apps on BYO-devices by containerizing them, leaving personal data untouched! > https://ad.doubleclick.net/ddm/clk/304595813;131938128;j > _______________________________________________ > Dmtcp-forum mailing list > Dmtcp-forum@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/dmtcp-forum ------------------------------------------------------------------------------ Mobile security can be enabling, not merely restricting. Employees who bring their own devices (BYOD) to work are irked by the imposition of MDM restrictions. Mobile Device Manager Plus allows you to control only the apps on BYO-devices by containerizing them, leaving personal data untouched! https://ad.doubleclick.net/ddm/clk/304595813;131938128;j _______________________________________________ Dmtcp-forum mailing list Dmtcp-forum@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dmtcp-forum