Hello,
Ive been trying to get dmtcp-4.4 to checkpoint a process that uses ssh on
itself and then launches a python multiprocessing step. I limit the
multiprocessing to only 1 thread giving a total of 1 program originating
the ssh, one process as a client and one thread as a multiprocessing
action.
When restarting on the same host I obtain no errors, but if I move the
checkpoints to a new host, I receive a number of errors, but the first one
is a bind error. Am I executing the restart incorrectly?
Note this occurs with and without --ib
I appreciate any help that can be given.
Ifconfig uses the ioctl access method to get the full address information,
which limits hardware addresses to 8 bytes.
Because Infiniband address has 20 bytes, only the first 8 bytes are
displayed correctly.
Ifconfig is obsolete! For replacement check ip.
[62000] WARNING at socketconnection.cpp:540 in postRestart;
REASON='JWARNING(_real_bind(_fds[0], (sockaddr*) &_bindAddr,_bindAddrlen)
== 0) failed'
(strerror((*__errno_location ()))) = Cannot assign requested address
id() = 36ff986ad20c9a14-62000-5783f119(99526)
Message: Bind failed.
[65000] WARNING at processinfo.cpp:373 in restoreProcessGroupInfo;
REASON='JWARNING(setpgid(0, _gid) == 0) failed'
_gid = 17611
(strerror((*__errno_location ()))) = Operation not permitted
Message: Cannot change group information
[62000] WARNING at fileconnlist.cpp:211 in resume;
REASON='JWARNING(unlink(missingUnlinkedShmFiles[i].name) != -1) failed'
missingUnlinkedShmFiles[i].name = /dev/shm/sem.qpUZHk
(strerror((*__errno_location ()))) = No such file or directory
Message: The file was unlinked at the time of checkpoint. Unlinking it
after restart failed
[62000] WARNING at fileconnlist.cpp:211 in resume;
REASON='JWARNING(unlink(missingUnlinkedShmFiles[i].name) != -1) failed'
missingUnlinkedShmFiles[i].name = /dev/shm/sem.HTMVTa
(strerror((*__errno_location ()))) = No such file or directory
Message: The file was unlinked at the time of checkpoint. Unlinking it
after restart failed
[62000] WARNING at fileconnlist.cpp:211 in resume;
REASON='JWARNING(unlink(missingUnlinkedShmFiles[i].name) != -1) failed'
missingUnlinkedShmFiles[i].name = /dev/shm/sem.SG5R50
(strerror((*__errno_location ()))) = No such file or directory
Message: The file was unlinked at the time of checkpoint. Unlinking it
after restart failed
[62000] WARNING at fileconnlist.cpp:211 in resume;
REASON='JWARNING(unlink(missingUnlinkedShmFiles[i].name) != -1) failed'
missingUnlinkedShmFiles[i].name = /dev/shm/sem.AAYOhR
(strerror((*__errno_location ()))) = No such file or directory
Message: The file was unlinked at the time of checkpoint. Unlinking it
after restart failed
[62000] WARNING at fileconnlist.cpp:211 in resume;
REASON='JWARNING(unlink(missingUnlinkedShmFiles[i].name) != -1) failed'
missingUnlinkedShmFiles[i].name = /dev/shm/sem.OWQMtH
(strerror((*__errno_location ()))) = No such file or directory
Message: The file was unlinked at the time of checkpoint. Unlinking it
after restart failed
[62000] WARNING at fileconnlist.cpp:211 in resume;
REASON='JWARNING(unlink(missingUnlinkedShmFiles[i].name) != -1) failed'
missingUnlinkedShmFiles[i].name = /dev/shm/sem.jMhLFx
(strerror((*__errno_location ()))) = No such file or directory
Message: The file was unlinked at the time of checkpoint. Unlinking it
after restart failed
[62000] WARNING at fileconnlist.cpp:211 in resume;
REASON='JWARNING(unlink(missingUnlinkedShmFiles[i].name) != -1) failed'
missingUnlinkedShmFiles[i].name = /dev/shm/sem.Bz3JRn
(strerror((*__errno_location ()))) = No such file or directory
Message: The file was unlinked at the time of checkpoint. Unlinking it
after restart failed
[62000] WARNING at fileconnlist.cpp:211 in resume;
REASON='JWARNING(unlink(missingUnlinkedShmFiles[i].name) != -1) failed'
missingUnlinkedShmFiles[i].name = /dev/shm/sem.EjqK3d
(strerror((*__errno_location ()))) = No such file or directory
Message: The file was unlinked at the time of checkpoint. Unlinking it
after restart failed
[62000] WARNING at fileconnlist.cpp:211 in resume;
REASON='JWARNING(unlink(missingUnlinkedShmFiles[i].name) != -1) failed'
missingUnlinkedShmFiles[i].name = /dev/shm/sem.VLzLf4
(strerror((*__errno_location ()))) = No such file or directory
Message: The file was unlinked at the time of checkpoint. Unlinking it
after restart failed
------------------------------------------------------------------------------
Attend Shape: An AT&T Tech Expo July 15-16. Meet us at AT&T Park in San
Francisco, CA to explore cutting-edge tech and listen to tech luminaries
present their vision of the future. This family event has something for
everyone, including kids. Get more information and register today.
http://sdm.link/attshape
_______________________________________________
Dmtcp-forum mailing list
Dmtcp-forum@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dmtcp-forum