Hi all,

I want to use dmtcp with slurm jobs.
I have setup a small cluster for testing with dmtcp v2.5.2 and slurm v19.05.0.
For starting and restarting jobs I use the sample scripts 'slurm_launch.job'
and 'slurm_rstr.job' in ./dmtcp-2.5.2/plugin/batch-queue/job_examples.

Command in script to launch:
dmtcp_launch ./counter

Command in script to restart:
/bin/bash ./dmtcp_restart_script.sh -h $DMTCP_COORD_HOST -p $DMTCP_COORD_PORT

For testing I use this simple shell script (./counter):

#!/bin/bash
i=0
while true; do
  echo $i
  sleep 1
  let i++
done

When the job is running in slurm I create a checkpoint manually with the script
dmtcp_command.jobid.
Restarting the job without slurm is working fine.
Restarting the job as Batchjob in slurm I get the following error:

Restart error: Cannot map initial resources into the restart allocation

Any ideas?
Is there anything I am doing wrong?

Best regards
Werner


Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

_______________________________________________
Dmtcp-forum mailing list
Dmtcp-forum@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dmtcp-forum

Reply via email to