Hi all, I want to use dmtcp with slurm jobs. I have setup a small cluster for testing with dmtcp v2.5.2 and slurm v19.05.0. For starting and restarting jobs I use the sample scripts 'slurm_launch.job' and 'slurm_rstr.job' in ./dmtcp-2.5.2/plugin/batch-queue/job_examples.
Command in script to launch: dmtcp_launch ./counter Command in script to restart: /bin/bash ./dmtcp_restart_script.sh -h $DMTCP_COORD_HOST -p $DMTCP_COORD_PORT For testing I use this simple shell script (./counter): #!/bin/bash i=0 while true; do echo $i sleep 1 let i++ done When the job is running in slurm I create a checkpoint manually with the script dmtcp_command.jobid. Restarting the job without slurm is working fine. Restarting the job as Batchjob in slurm I get the following error: Restart error: Cannot map initial resources into the restart allocation Any ideas? Is there anything I am doing wrong? Best regards Werner
smime.p7s
Description: S/MIME Cryptographic Signature
_______________________________________________ Dmtcp-forum mailing list Dmtcp-forum@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dmtcp-forum