I think what Artem meant was you need to keep the allocations consistent
before checkpoint and after restart. For instance,
if you use 4 nodes * 2 processesPerNode before checkpoint, you should
specify the same configuration in the restart script.

On Fri, Oct 9, 2015 at 10:06 AM, Artem Polyakov <artpo...@gmail.com> wrote:

> P.S. no way to avoid that for now and near future IMO.
>
> 2015-10-09 17:01 GMT+03:00 Artem Polyakov <artpo...@gmail.com>:
>
>> You don't need "exact" allocation in terms of nodenames but you do need
>> to remember how many nodes and how many procs per node you had in original
>> allocation.
>>
>> 2015-10-09 16:39 GMT+03:00 MR.AB <denilson...@yahoo.fr>:
>>
>>> Hey
>>> Thank you for the email, is there a way to make it work or i have tot
>>> have variables to "remember" the exact allocations?
>>>
>>>
>>>
>>> On Friday, October 9, 2015 4:34 AM, Artem Polyakov <artpo...@gmail.com>
>>> wrote:
>>>
>>>
>>> Hello,
>>> Please note, that one of the reasons may be non-equivalent allocations.
>>> DMTCP cannot restore processes that was originally running on the same node
>>> to be on different nodes. This means that if you originally requested the
>>> following allocation: cn[0-1], ppn = 4
>>> and trying to restart on cn[0-4], ppn = 2
>>> this won't work even though the allocations are logically equivalent.
>>>
>>> 2015-10-08 16:00 GMT+03:00 abderrahmane <denilson...@yahoo.fr>:
>>>
>>> Hello
>>>
>>> I did it and still got Restart error : cannot map initial resources into
>>> the restart allocation.
>>>
>>> Also i used openmpi 1.8.8 and got the same error msg.
>>>
>>>
>>> On 10/06/2015 07:06 PM, Jiajun Cao wrote:
>>>
>>> Hi,
>>>
>>> Could you replace
>>>
>>> dmtcp_launch --rm mpirun --mca btl self,tcp ./<your binary>
>>>
>>> with the following:
>>>
>>> srun dmtcp_launch --rm ./<your binary>
>>>
>>> Also, add the following env vars to the script:
>>>
>>> export OMPI_MCA_mtl=^psm
>>> export OMPI_MCA_btl=self,tcp
>>>
>>> and try again?
>>>
>>> On Tue, Oct 6, 2015 at 4:41 PM, abderrahmane <denilson...@yahoo.fr>
>>> wrote:
>>>
>>> Hello
>>> ]Thanks for the respond.
>>>
>>>
>>> On 10/06/2015 02:18 PM, Jiajun Cao wrote:
>>>
>>> Hi,
>>>
>>>
>>> 1. What kind of application are you running? Is there an integration of
>>> matlab and mpi? I'm asking because I haven't run any mpi-based matlab
>>> applications before.
>>>
>>> i just created a script that calculate fibonacci number a prints it out.
>>>
>>> 2. What kind of environment are you using? Specifically, I'd like to
>>> know the MPI version, interconnect network type (Ethernet or InfiniBand),
>>> and how MPI and Slurm are integrated (i.e., in the cluster, what command do
>>> you use to run the application, srun or mpirun).
>>>
>>> I am using rhel7 and openmpi 1.8 inbiniband. for the slurm it is
>>> integrated in a cluster environment, I used the script here :
>>>
>>> https://github.com/dmtcp/dmtcp/blob/master/plugin/batch-queue/job_examples/slurm_launch.job
>>>
>>> 3. Do you get a valid checkpoint image(s)? Also, please attach your job
>>> scripts.
>>>
>>> I get the checkpoint needed but when i restart i received the error i
>>> sent
>>>
>>> Thanks
>>>
>>>
>>> On Tue, Oct 6, 2015 at 1:29 PM, Kapil Arya < <kapil.arya...@gmail.com>
>>> kapil.arya...@gmail.com> wrote:
>>>
>>> Jiajun, Artem,
>>>
>>> Can one of you take a look at this one?
>>>
>>> Kapil
>>>
>>> On Tue, Oct 6, 2015 at 12:31 PM, abderrahmane < <denilson...@yahoo.fr>
>>> denilson...@yahoo.fr> wrote:
>>>
>>> Hello
>>>
>>> Thank you for the effort and work (dmtcp), I do have some questions:
>>> ( P.S :I run my matlab code using --rm mpirun and slurm.)
>>>
>>> 1- is there a good way to run matlab code? I created a bash file in
>>> added the following :
>>>      matlab -nojvm < file.m
>>>
>>> 2- running the code above with dmtcp and matlab worked fine, but when i
>>> tried to restart the code using slurm_restart.job code from your github
>>> and using --rm mpirun , I received the following error:
>>>
>>> restart error: cannot map initial resources into the restart allocation.
>>> Allocated resources : *nodex:4  nodey:4
>>>
>>> any ideas? please feel free to ask me more questions.
>>>
>>> best regards;
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> _______________________________________________
>>> Dmtcp-forum mailing list
>>> Dmtcp-forum@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/dmtcp-forum
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> --
>>> С Уважением, Поляков Артем Юрьевич
>>> Best regards, Artem Y. Polyakov
>>>
>>>
>>>
>>
>>
>> --
>> С Уважением, Поляков Артем Юрьевич
>> Best regards, Artem Y. Polyakov
>>
>
>
>
> --
> С Уважением, Поляков Артем Юрьевич
> Best regards, Artem Y. Polyakov
>
------------------------------------------------------------------------------
_______________________________________________
Dmtcp-forum mailing list
Dmtcp-forum@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dmtcp-forum

Reply via email to