Any Suggestions ?

From: basmaabdelaz...@hotmail.com
To: ka...@ccs.neu.edu
Subject: RE: [Dmtcp-forum] OpenMPI program Checkpoint restart
Date: Tue, 1 Oct 2013 02:04:58 +0200




Thank you for your reply

OpenMPI version : 1.6.5
DMTCP version :dmtcp-1.2.8
gcc version :gcc-4.7
linux ubuntu Kernel version 3.5.0-28-generic
my program is MPI/C Integer Sort of Nas Parallel Benchmark NPB 3.3  which run 
using 4 processes

is this the right way and the only way to restart Openmpi programusing DMTCP? 
or what i did wrong?
and can i try a MPI/Fortran program also?


Thank you

From: ka...@ccs.neu.edu
Date: Sun, 29 Sep 2013 11:42:29 -0400
Subject: Re: [Dmtcp-forum] OpenMPI program Checkpoint restart
To: basmaabdelaz...@hotmail.com
CC: dmtcp-forum@lists.sourceforge.net

Hi,
Thank you for contacting us.
Could your provide us more information about the OpenMPI version that you are 
using? Also, DMTCP, libc, gcc, and kernel versions too.


Thanks,Kapil

On Sat, Sep 28, 2013 at 6:49 PM, basma a.azeem <basmaabdelaz...@hotmail.com> 
wrote:








i need to use DMTCP to checkpoint and restart OpenMPI program
DMTCP was installed on my machine 
and openmpi run normally


so i ran the following command:

:~$ dmtcp_checkpoint  mpirun -np 4  
/home/basma/NPB3.3/NPB3.3/NPB3.3-MPI/bin/is.A.4


 my program is MPI/C Integer Sort of Nas Parallel Benchmark which run using 4 
processes

then i created a manual checkpoint using the dmtcp_coordinator

so i had the following files in my home folder:


ckpt_orterun_5721e6a7ff40367d-2937-52471e26.dmtcp

ckpt_is.A.4_5721e6a7ff40367d-2942-52471e26.dmtcp
ckpt_is.A.4_5721e6a7ff40367d-2944-52471e26.dmtcp
ckpt_is.A.4_5721e6a7ff40367d-2947-52471e26.dmtcp
ckpt_is.A.4_5721e6a7ff40367d-2950-52471e26.dmtcp
dmtcp_restart_script.sh


dmtcp_restart_script_5721e6a7ff40367d-2937-52471e26.sh

i used the following command to restart:

basma@basma-Satellite-A500:~$  ./dmtcp_restart_script.sh



dmtcp_checkpoint (DMTCP + MTCP) 1.2.8
Copyright (C) 2006-2011  Jason Ansel, Michael Rieker, Kapil Arya, and


                                                       Gene Cooperman
This program comes with ABSOLUTELY NO WARRANTY.


This is free software, and you are welcome to redistribute it
under certain conditions; see COPYING file for details.


(Use flag "-q" to hide this message.)

[3398] ERROR at connection.cpp:1137 in restore; 
REASON='JASSERT(jalib::Filesystem::FileExists(_path) == false) failed'


     _path = /run/shm/open_mpi.0001
Message: 
**** File already exists! Checkpointed copy can't be restored.


****Delete the existing file and try again!
dmtcp_restart (3398): Terminating...


which file should i use to restart the OpenMPI program?and which command?





                                          

------------------------------------------------------------------------------

October Webinars: Code for Performance

Free Intel webinars can help you accelerate application performance.

Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from

the latest Intel processors and coprocessors. See abstracts and register >

http://pubads.g.doubleclick.net/gampad/clk?id=60133471&iu=/4140/ostg.clktrk
_______________________________________________



Dmtcp-forum mailing list

Dmtcp-forum@lists.sourceforge.net

https://lists.sourceforge.net/lists/listinfo/dmtcp-forum



                                                                                
  
------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60134791&iu=/4140/ostg.clktrk
_______________________________________________
Dmtcp-forum mailing list
Dmtcp-forum@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dmtcp-forum

Reply via email to