Hello,

Ive been trying to get dmtcp-4.4 to checkpoint a process that uses ssh on
itself and then launches a python multiprocessing step.  I limit the
multiprocessing to only 1 thread giving a total of 1 program originating
the ssh, one process as a client and one thread as a multiprocessing
action.

When restarting on the same host I obtain no errors, but if I move the
checkpoints to a new host, I receive a number of errors, but the first one
is a bind error.  Am I executing the restart incorrectly?

Note this occurs with and without --ib

I appreciate any help that can be given.

Ifconfig uses the ioctl access method to get the full address information,
which limits hardware addresses to 8 bytes.
Because Infiniband address has 20 bytes, only the first 8 bytes are
displayed correctly.
Ifconfig is obsolete! For replacement check ip.
[62000] WARNING at socketconnection.cpp:540 in postRestart;
REASON='JWARNING(_real_bind(_fds[0], (sockaddr*) &_bindAddr,_bindAddrlen)
== 0) failed'
     (strerror((*__errno_location ()))) = Cannot assign requested address
     id() = 36ff986ad20c9a14-62000-5783f119(99526)
Message: Bind failed.
[65000] WARNING at processinfo.cpp:373 in restoreProcessGroupInfo;
REASON='JWARNING(setpgid(0, _gid) == 0) failed'
     _gid = 17611
     (strerror((*__errno_location ()))) = Operation not permitted
Message: Cannot change group information
[62000] WARNING at fileconnlist.cpp:211 in resume;
REASON='JWARNING(unlink(missingUnlinkedShmFiles[i].name) != -1) failed'
     missingUnlinkedShmFiles[i].name = /dev/shm/sem.qpUZHk
     (strerror((*__errno_location ()))) = No such file or directory
Message: The file was unlinked at the time of checkpoint. Unlinking it
after restart failed
[62000] WARNING at fileconnlist.cpp:211 in resume;
REASON='JWARNING(unlink(missingUnlinkedShmFiles[i].name) != -1) failed'
     missingUnlinkedShmFiles[i].name = /dev/shm/sem.HTMVTa
     (strerror((*__errno_location ()))) = No such file or directory
Message: The file was unlinked at the time of checkpoint. Unlinking it
after restart failed
[62000] WARNING at fileconnlist.cpp:211 in resume;
REASON='JWARNING(unlink(missingUnlinkedShmFiles[i].name) != -1) failed'
     missingUnlinkedShmFiles[i].name = /dev/shm/sem.SG5R50
     (strerror((*__errno_location ()))) = No such file or directory
Message: The file was unlinked at the time of checkpoint. Unlinking it
after restart failed
[62000] WARNING at fileconnlist.cpp:211 in resume;
REASON='JWARNING(unlink(missingUnlinkedShmFiles[i].name) != -1) failed'
     missingUnlinkedShmFiles[i].name = /dev/shm/sem.AAYOhR
     (strerror((*__errno_location ()))) = No such file or directory
Message: The file was unlinked at the time of checkpoint. Unlinking it
after restart failed
[62000] WARNING at fileconnlist.cpp:211 in resume;
REASON='JWARNING(unlink(missingUnlinkedShmFiles[i].name) != -1) failed'
     missingUnlinkedShmFiles[i].name = /dev/shm/sem.OWQMtH
     (strerror((*__errno_location ()))) = No such file or directory
Message: The file was unlinked at the time of checkpoint. Unlinking it
after restart failed
[62000] WARNING at fileconnlist.cpp:211 in resume;
REASON='JWARNING(unlink(missingUnlinkedShmFiles[i].name) != -1) failed'
     missingUnlinkedShmFiles[i].name = /dev/shm/sem.jMhLFx
     (strerror((*__errno_location ()))) = No such file or directory
Message: The file was unlinked at the time of checkpoint. Unlinking it
after restart failed
[62000] WARNING at fileconnlist.cpp:211 in resume;
REASON='JWARNING(unlink(missingUnlinkedShmFiles[i].name) != -1) failed'
     missingUnlinkedShmFiles[i].name = /dev/shm/sem.Bz3JRn
     (strerror((*__errno_location ()))) = No such file or directory
Message: The file was unlinked at the time of checkpoint. Unlinking it
after restart failed
[62000] WARNING at fileconnlist.cpp:211 in resume;
REASON='JWARNING(unlink(missingUnlinkedShmFiles[i].name) != -1) failed'
     missingUnlinkedShmFiles[i].name = /dev/shm/sem.EjqK3d
     (strerror((*__errno_location ()))) = No such file or directory
Message: The file was unlinked at the time of checkpoint. Unlinking it
after restart failed
[62000] WARNING at fileconnlist.cpp:211 in resume;
REASON='JWARNING(unlink(missingUnlinkedShmFiles[i].name) != -1) failed'
     missingUnlinkedShmFiles[i].name = /dev/shm/sem.VLzLf4
     (strerror((*__errno_location ()))) = No such file or directory
Message: The file was unlinked at the time of checkpoint. Unlinking it
after restart failed
------------------------------------------------------------------------------
Attend Shape: An AT&T Tech Expo July 15-16. Meet us at AT&T Park in San
Francisco, CA to explore cutting-edge tech and listen to tech luminaries
present their vision of the future. This family event has something for
everyone, including kids. Get more information and register today.
http://sdm.link/attshape
_______________________________________________
Dmtcp-forum mailing list
Dmtcp-forum@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dmtcp-forum

Reply via email to