Dear DMTCP team,

  Appreciate your support regarding the below issue.


I am using a single machine to learn DMTCP. The operating system is "CentOS 
release 6.8", and it uses a network file system. I run a simple MPI program 
(dummy.c), using mpich V3.2.


On terminal-1:

dmtcp_coordinator


On terminal-2:

dmtcp_launch mpiexec -n 3 ./dummy.mpich2 10 10000


While dummy is running in terminal-2, I move to terminal-1 and press 'c' , then 
'q' to exit.


To restart, I run the generated dmtcp_restart_script.sh script, but I get the 
error below. Would you please advice on a possible fix for this issue?


(P.S. I tried the same steps on another machine (with Ubuntu 14.04 OS) that has 
a local file system, and the restart worked successfully. Is there specific 
configuration I should use with network file systems?)


size = 1
[43000] ERROR at fileconnlist.cpp:318 in recreateShmFileAndMap; 
REASON='JASSERT(fd != -1 || errno == EEXIST) failed'
     area.name = /ram/var/run/nscd/dbbxzrxW
dummy.mpich2 (43000): Terminating...
[44000] ERROR at fileconnlist.cpp:318 in recreateShmFileAndMap; 
REASON='JASSERT(fd != -1 || errno == EEXIST) failed'
     area.name = /ram/var/run/nscd/dbbxzrxW
dummy.mpich2 (44000): Terminating...
[42000] ERROR at fileconnlist.cpp:318 in recreateShmFileAndMap; 
REASON='JASSERT(fd != -1 || errno == EEXIST) failed'
     area.name = /ram/var/run/nscd/dbbxzrxW
dummy.mpich2 (42000): Terminating...
[40000] ERROR at connectionidentifier.h:96 in assertValid; 
REASON='JASSERT(strcmp(sign, HANDSHAKE_SIGNATURE_MSG) == 0) failed'
     sign =
Message: read invalid message, signature mismatch. (External socket?)
mpiexec.hydra (40000): Terminating...
[41000] ERROR at connectionidentifier.h:96 in assertValid; 
REASON='JASSERT(strcmp(sign, HANDSHAKE_SIGNATURE_MSG) == 0) failed'
     sign =
Message: read invalid message, signature mismatch. (External socket?)
hydra_pmi_proxy (41000): Terminating...



Best Regards,
Sara

Sara S. Hamouda
PhD Candidate (Computer Systems Group)
College of Engineering and Computer Science
The Australian National University
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
Dmtcp-forum mailing list
Dmtcp-forum@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dmtcp-forum

Reply via email to