Hey 
Thank you for the email, is there a way to make it work or i have tot have 
variables to "remember" the exact allocations?
 


     On Friday, October 9, 2015 4:34 AM, Artem Polyakov <artpo...@gmail.com> 
wrote:
   

 Hello,Please note, that one of the reasons may be non-equivalent allocations. 
DMTCP cannot restore processes that was originally running on the same node to 
be on different nodes. This means that if you originally requested the 
following allocation: cn[0-1], ppn = 4and trying to restart on cn[0-4], ppn = 
2this won't work even though the allocations are logically equivalent.

2015-10-08 16:00 GMT+03:00 abderrahmane <denilson...@yahoo.fr>:

  Hello
 
 I did it and still got Restart error : cannot map initial resources into the 
restart allocation.
 
 Also i used openmpi 1.8.8 and got the same error msg.
 
 
 On 10/06/2015 07:06 PM, Jiajun Cao wrote:
  
 Hi, 
  Could you replace 
  dmtcp_launch --rm mpirun --mca btl self,tcp ./<your binary>
  
  with the following: 
  srun dmtcp_launch --rm ./<your binary>
  
  Also, add the following env vars to the script: 
  export OMPI_MCA_mtl=^psm
  export OMPI_MCA_btl=self,tcp
  
  and try again?  
 On Tue, Oct 6, 2015 at 4:41 PM, abderrahmane <denilson...@yahoo.fr> wrote:
 
  Hello
 ]Thanks for the respond.
 
 
 On 10/06/2015 02:18 PM, Jiajun Cao wrote:
  
 Hi,  
  
  1. What kind of application are you running? Is there an integration of 
matlab and mpi? I'm asking because I haven't run any mpi-based matlab 
applications before. 
   
 i just created a script that calculate fibonacci number a prints it out.
 
  2. What kind of environment are you using? Specifically, I'd like to know the 
MPI version, interconnect network type (Ethernet or  InfiniBand), and how MPI 
and Slurm are integrated (i.e., in the cluster, what command do you use to run 
the application, srun or mpirun). 
   
 I am using rhel7 and openmpi 1.8 inbiniband. for the slurm it is integrated in 
a cluster environment, I used the script here :
 
https://github.com/dmtcp/dmtcp/blob/master/plugin/batch-queue/job_examples/slurm_launch.job
 
 
  3. Do you get a valid checkpoint image(s)? Also, please attach your job 
scripts.  
 I get the checkpoint needed but when i restart i received the error i sent
 
 Thanks 
  
 
 On Tue, Oct 6, 2015 at 1:29 PM, Kapil Arya <kapil.arya...@gmail.com> wrote:
 
 Jiajun, Artem, 
  Can one of you take a look at this one?  
  Kapil    
 On Tue, Oct 6, 2015 at 12:31 PM, abderrahmane <denilson...@yahoo.fr> wrote:
 
Hello
 
 Thank you for the effort and work (dmtcp), I do have some questions:
 ( P.S :I run my matlab code using --rm mpirun and slurm.)
 
 1- is there a good way to run matlab code? I created a bash file in
 added the following :
      matlab -nojvm < file.m
 
 2- running the code above with dmtcp and matlab worked fine, but when i
 tried to restart the code using slurm_restart.job code from your github
 and using --rm mpirun , I received the following error:
 
 restart error: cannot map initial resources into the restart allocation.
 Allocated resources : *nodex:4  nodey:4
 
 any ideas? please feel free to ask me more questions.
 
 best regards;
 
------------------------------------------------------------------------------
_______________________________________________
 Dmtcp-forum mailing list
 Dmtcp-forum@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/dmtcp-forum
 
  
    
  
  
 
  
  
  
 
 



-- 
С Уважением, Поляков Артем Юрьевич
Best regards, Artem Y. Polyakov

  
------------------------------------------------------------------------------
_______________________________________________
Dmtcp-forum mailing list
Dmtcp-forum@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dmtcp-forum

Reply via email to