Hi, Could you replace
dmtcp_launch --rm mpirun --mca btl self,tcp ./<your binary> with the following: srun dmtcp_launch --rm ./<your binary> Also, add the following env vars to the script: export OMPI_MCA_mtl=^psm export OMPI_MCA_btl=self,tcp and try again? On Tue, Oct 6, 2015 at 4:41 PM, abderrahmane <denilson...@yahoo.fr> wrote: > Hello > ]Thanks for the respond. > > > On 10/06/2015 02:18 PM, Jiajun Cao wrote: > > Hi, > > > 1. What kind of application are you running? Is there an integration of > matlab and mpi? I'm asking because I haven't run any mpi-based matlab > applications before. > > i just created a script that calculate fibonacci number a prints it out. > > 2. What kind of environment are you using? Specifically, I'd like to know > the MPI version, interconnect network type (Ethernet or InfiniBand), and > how MPI and Slurm are integrated (i.e., in the cluster, what command do you > use to run the application, srun or mpirun). > > I am using rhel7 and openmpi 1.8 inbiniband. for the slurm it is > integrated in a cluster environment, I used the script here : > > https://github.com/dmtcp/dmtcp/blob/master/plugin/batch-queue/job_examples/slurm_launch.job > > 3. Do you get a valid checkpoint image(s)? Also, please attach your job > scripts. > > I get the checkpoint needed but when i restart i received the error i sent > > Thanks > > > On Tue, Oct 6, 2015 at 1:29 PM, Kapil Arya <kapil.arya...@gmail.com> > wrote: > >> Jiajun, Artem, >> >> Can one of you take a look at this one? >> >> Kapil >> >> On Tue, Oct 6, 2015 at 12:31 PM, abderrahmane < <denilson...@yahoo.fr> >> denilson...@yahoo.fr> wrote: >> >>> Hello >>> >>> Thank you for the effort and work (dmtcp), I do have some questions: >>> ( P.S :I run my matlab code using --rm mpirun and slurm.) >>> >>> 1- is there a good way to run matlab code? I created a bash file in >>> added the following : >>> matlab -nojvm < file.m >>> >>> 2- running the code above with dmtcp and matlab worked fine, but when i >>> tried to restart the code using slurm_restart.job code from your github >>> and using --rm mpirun , I received the following error: >>> >>> restart error: cannot map initial resources into the restart allocation. >>> Allocated resources : *nodex:4 nodey:4 >>> >>> any ideas? please feel free to ask me more questions. >>> >>> best regards; >>> >>> >>> ------------------------------------------------------------------------------ >>> _______________________________________________ >>> Dmtcp-forum mailing list >>> Dmtcp-forum@lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/dmtcp-forum >>> >> >> > >
------------------------------------------------------------------------------
_______________________________________________ Dmtcp-forum mailing list Dmtcp-forum@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dmtcp-forum