Hi,

Could you replace

dmtcp_launch --rm mpirun --mca btl self,tcp ./<your binary>

with the following:

srun dmtcp_launch --rm ./<your binary>

Also, add the following env vars to the script:

export OMPI_MCA_mtl=^psm
export OMPI_MCA_btl=self,tcp

and try again?

On Tue, Oct 6, 2015 at 4:41 PM, abderrahmane <denilson...@yahoo.fr> wrote:

> Hello
> ]Thanks for the respond.
>
>
> On 10/06/2015 02:18 PM, Jiajun Cao wrote:
>
> Hi,
>
>
> 1. What kind of application are you running? Is there an integration of
> matlab and mpi? I'm asking because I haven't run any mpi-based matlab
> applications before.
>
> i just created a script that calculate fibonacci number a prints it out.
>
> 2. What kind of environment are you using? Specifically, I'd like to know
> the MPI version, interconnect network type (Ethernet or InfiniBand), and
> how MPI and Slurm are integrated (i.e., in the cluster, what command do you
> use to run the application, srun or mpirun).
>
> I am using rhel7 and openmpi 1.8 inbiniband. for the slurm it is
> integrated in a cluster environment, I used the script here :
>
> https://github.com/dmtcp/dmtcp/blob/master/plugin/batch-queue/job_examples/slurm_launch.job
>
> 3. Do you get a valid checkpoint image(s)? Also, please attach your job
> scripts.
>
> I get the checkpoint needed but when i restart i received the error i sent
>
> Thanks
>
>
> On Tue, Oct 6, 2015 at 1:29 PM, Kapil Arya <kapil.arya...@gmail.com>
> wrote:
>
>> Jiajun, Artem,
>>
>> Can one of you take a look at this one?
>>
>> Kapil
>>
>> On Tue, Oct 6, 2015 at 12:31 PM, abderrahmane < <denilson...@yahoo.fr>
>> denilson...@yahoo.fr> wrote:
>>
>>> Hello
>>>
>>> Thank you for the effort and work (dmtcp), I do have some questions:
>>> ( P.S :I run my matlab code using --rm mpirun and slurm.)
>>>
>>> 1- is there a good way to run matlab code? I created a bash file in
>>> added the following :
>>>      matlab -nojvm < file.m
>>>
>>> 2- running the code above with dmtcp and matlab worked fine, but when i
>>> tried to restart the code using slurm_restart.job code from your github
>>> and using --rm mpirun , I received the following error:
>>>
>>> restart error: cannot map initial resources into the restart allocation.
>>> Allocated resources : *nodex:4  nodey:4
>>>
>>> any ideas? please feel free to ask me more questions.
>>>
>>> best regards;
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> _______________________________________________
>>> Dmtcp-forum mailing list
>>> Dmtcp-forum@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/dmtcp-forum
>>>
>>
>>
>
>
------------------------------------------------------------------------------
_______________________________________________
Dmtcp-forum mailing list
Dmtcp-forum@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dmtcp-forum

Reply via email to