Dear all,

I am very new to DMTCP (version--2.4.4). May be I am doing some very silly
mistakes in the following script which is just to test DMTCP and MPI for
checkpointing and restart. The checkpoint images are generated and can be
found in the ckpt directory but restart is not happening. Please help me
out!!

Best Regards,
Rakesh

*Error in restart is:*

[51000] ERROR at fileconnection.cpp:686 in refill;
REASON='JASSERT(jalib::Filesy
stem::FileExists(_path)) failed'
_path = /var/spool/PBS/mom_priv/hooks/resourcedef
Message: File not found.
a.out (51000): Terminating...
=>> PBS: job killed: walltime 152 exceeded limit 120

*My script file is:*

#! /bin/bash
#PBS -N dmtcp_mpi
#PBS -q normal
#PBS -l select=2:ncpus=4
#PBS -l place=free
#PBS -l walltime=00:02:00
#PBS -j oe

echo "PBS_JOBID="$PBS_JOBID
echo "PBS_NODEFILE"=$PBS_NODEFILE
cat $PBS_NODEFILE
echo "PBS_O_WORKDIR"=$PBS_O_WORKDIR

cd $PBS_O_WORKDIR
module load composerxe/2016.1.150
export PATH="$HOME/myapps/dmtcp/bin:$PATH"
export LD_LIBRARY_PATH="$HOME/myapps/dmtcp/lib/dmtcp:$LD_LIBRARY_PATH"
source $HOME/myapps/batch_dmtcp.sh

### Environment variables for DMTCP
export DMTCP_CHECKPOINT_INTERVAL=60
export DMTCP_CHECKPOINT_DIR=$PWD/ckpt
export DMTCP_DL_PLUGIN=0

### Start coordinator
start_coordinator

### Checkpoint & restart
restart=1
if [[ $restart -eq 0 ]]; then
  echo "Checkpointing...restart"=$restart
  rm -rf ckpt; mkdir -p ./ckpt
  dmtcp_launch --ib --rm mpirun -f $PBS_NODEFILE -np 8 ./a.out
else
  echo "Restart...restart"=$restart
  echo " $DMTCP_CHECKPOINT_DIR/dmtcp_restart_script.sh"
  $DMTCP_CHECKPOINT_DIR/dmtcp_restart_script.sh -h $DMTCP_COORD_HOST
-p $DMTCP_COORD_PORT
fi
------------------------------------------------------------------------------
Attend Shape: An AT&T Tech Expo July 15-16. Meet us at AT&T Park in San
Francisco, CA to explore cutting-edge tech and listen to tech luminaries
present their vision of the future. This family event has something for
everyone, including kids. Get more information and register today.
http://sdm.link/attshape
_______________________________________________
Dmtcp-forum mailing list
Dmtcp-forum@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dmtcp-forum

Reply via email to