Hello! I have run into an error upon restart.
I am on DMTCP 2.4.4 and am running a python script that calls several execs across a distributed environment. I checkpoint across 4 nods and then restart on a single node, though some processes lag and are checkpointed on those external nodes. I correctly set new host for coordinator and the hostfile is set to 127.0.0.1 for the number of hosts indicated in the dmtcp_restart_script.sh When restarting, all processes read refilled except the original process id 40000 of the python script which read DONE_QUERYING. I have not come across this error in the forums and was curious if you can advice where to go next. The readdmtcp.sh file provided the following for the python process checkpoint file: *** ckpt_python2.7_1e0e8acb7e4b6b05-40000-57d7e91e.dmtcp is a gzipped file. Will uncompress it into ckpt_tmp.dmcp first. Considering 'ckpt_tmp.dmtcp' as a ckpt image. MTCP: MTCP_HEADER_v2.2 **** mtcp_restart (will be copied here): 0x2b5dd122e000-0x2b5dd1c2e000 **** DMTCP entry point (ThreadList::postRestart()): 0x2b5dd0023160 **** brk (sbrk(0)): 0x2223000 **** vdso: 0x7fff830ee000-0x7fff830ef000 **** vvar: 0x0-0x0 **** Listing ckpt image area: 0x400000-0x401000 r-xp /oasis/scratch/willfox/temp_project/tigres/environments/montageenv/bin/python2.7 0x600000-0x601000 rw-p /oasis/scratch/willfox/temp_project/tigres/environments/montageenv/bin/python2.7 0x1098000-0x2223000 rw-p [heap] 0x3902c00000-0x3902c20000 r-xp /lib64/ld-2.12.so 0x3902e1f000-0x3902e20000 r--p /lib64/ld-2.12.so 0x3902e20000-0x3902e21000 rw-p /lib64/ld-2.12.so 0x3902e21000-0x3902e22000 rw-p 0x3903000000-0x390318a000 r-xp /lib64/libc-2.12.so 0x390338a000-0x390338e000 r--p /lib64/libc-2.12.so 0x390338e000-0x390338f000 rw-p /lib64/libc-2.12.so 0x390338f000-0x3903394000 rw-p 0x3903400000-0x3903483000 r-xp /lib64/libm-2.12.so 0x3903682000-0x3903683000 r--p /lib64/libm-2.12.so 0x3903683000-0x3903684000 rw-p /lib64/libm-2.12.so 0x3903800000-0x3903802000 r-xp /lib64/libdl-2.12.so 0x3903a02000-0x3903a03000 r--p /lib64/libdl-2.12.so 0x3903a03000-0x3903a04000 rw-p /lib64/libdl-2.12.so 0x3903c00000-0x3903c17000 r-xp /lib64/libpthread-2.12.so 0x3903e17000-0x3903e18000 r--p /lib64/libpthread-2.12.so 0x3903e18000-0x3903e19000 rw-p /lib64/libpthread-2.12.so 0x3903e19000-0x3903e1d000 rw-p 0x3904000000-0x390400d000 r-xp /usr/lib64/libibverbs.so.1.0.0 0x390420c000-0x390420d000 rw-p /usr/lib64/libibverbs.so.1.0.0 0x3904c00000-0x3904c1d000 r-xp /lib64/libselinux.so.1 0x3904e1c000-0x3904e1d000 r--p /lib64/libselinux.so.1 0x3904e1d000-0x3904e1e000 rw-p /lib64/libselinux.so.1 0x3904e1e000-0x3904e1f000 rw-p 0x3905000000-0x3905016000 r-xp /lib64/libresolv-2.12.so 0x3905216000-0x3905217000 r--p /lib64/libresolv-2.12.so 0x3905217000-0x3905218000 rw-p /lib64/libresolv-2.12.so 0x3905218000-0x390521a000 rw-p 0x3908c00000-0x3908c04000 r-xp /lib64/libuuid.so.1.3.0 0x3908e03000-0x3908e04000 rw-p /lib64/libuuid.so.1.3.0 0x3909400000-0x39095b5000 r-xp /usr/lib64/libcrypto.so.1.0.1e 0x39097b5000-0x39097d0000 r--p /usr/lib64/libcrypto.so.1.0.1e 0x39097d0000-0x39097dc000 rw-p /usr/lib64/libcrypto.so.1.0.1e 0x39097dc000-0x39097e0000 rw-p 0x390e800000-0x390e802000 r-xp /lib64/libkeyutils.so.1.3 0x390ea01000-0x390ea02000 r--p /lib64/libkeyutils.so.1.3 0x390ea02000-0x390ea03000 rw-p /lib64/libkeyutils.so.1.3 0x390f000000-0x390f16f000 r-xp /lib64/libdb-4.7.so 0x390f36e000-0x390f374000 rw-p /lib64/libdb-4.7.so 0x390f800000-0x390f803000 r-xp /lib64/libcom_err.so.2.1 0x390fa02000-0x390fa03000 r--p /lib64/libcom_err.so.2.1 0x390fa03000-0x390fa04000 rw-p /lib64/libcom_err.so.2.1 0x3910000000-0x3910002000 r-xp /lib64/libutil-2.12.so 0x3910201000-0x3910202000 r--p /lib64/libutil-2.12.so 0x3910202000-0x3910203000 rw-p /lib64/libutil-2.12.so 0x3910800000-0x391083f000 r-xp /lib64/libgssapi_krb5.so.2.2 0x3910a3f000-0x3910a40000 r--p /lib64/libgssapi_krb5.so.2.2 0x3910a40000-0x3910a42000 rw-p /lib64/libgssapi_krb5.so.2.2 0x3910c00000-0x3910c0a000 r-xp /lib64/libkrb5support.so.0.1 0x3910e09000-0x3910e0a000 r--p /lib64/libkrb5support.so.0.1 0x3910e0a000-0x3910e0b000 rw-p /lib64/libkrb5support.so.0.1 0x3911000000-0x39110d5000 r-xp /lib64/libkrb5.so.3.3 0x39112d5000-0x39112de000 r--p /lib64/libkrb5.so.3.3 0x39112de000-0x39112e0000 rw-p /lib64/libkrb5.so.3.3 0x3911400000-0x391142a000 r-xp /lib64/libk5crypto.so.3.1 0x3911629000-0x391162b000 r--p /lib64/libk5crypto.so.3.1 0x391162b000-0x391162c000 rw-p /lib64/libk5crypto.so.3.1 0x3911800000-0x3911861000 r-xp /usr/lib64/libssl.so.1.0.1e 0x3911a61000-0x3911a65000 r--p /usr/lib64/libssl.so.1.0.1e 0x3911a65000-0x3911a6c000 rw-p /usr/lib64/libssl.so.1.0.1e 0x3919800000-0x391988c000 r-xp /usr/lib64/libsqlite3.so.0.8.6 0x3919a8b000-0x3919a8e000 rw-p /usr/lib64/libsqlite3.so.0.8.6 0x3919a8e000-0x3919a8f000 rw-p 0x2b5dcf34e000-0x2b5dcf350000 rw-p 0x2b5dcf350000-0x2b5dcf35c000 r-xp /home/willfox/local/lib/dmtcp/libdmtcp_infiniband.so 0x2b5dcf55c000-0x2b5dcf55d000 rw-p /home/willfox/local/lib/dmtcp/libdmtcp_infiniband.so 0x2b5dcf55d000-0x2b5dcf55f000 r-xp /home/willfox/local/lib/dmtcp/libdmtcp_alloc.so 0x2b5dcf75e000-0x2b5dcf75f000 rw-p /home/willfox/local/lib/dmtcp/libdmtcp_alloc.so 0x2b5dcf75f000-0x2b5dcf761000 r-xp /home/willfox/local/lib/dmtcp/libdmtcp_dl.so 0x2b5dcf960000-0x2b5dcf961000 rw-p /home/willfox/local/lib/dmtcp/libdmtcp_dl.so 0x2b5dcf961000-0x2b5dcf962000 rw-p 0x2b5dcf962000-0x2b5dcf9c2000 r-xp /home/willfox/local/lib/dmtcp/libdmtcp_ipc.so 0x2b5dcfbc1000-0x2b5dcfbc4000 rw-p /home/willfox/local/lib/dmtcp/libdmtcp_ipc.so 0x2b5dcfbc4000-0x2b5dcfbc5000 rw-p 0x2b5dcfbc5000-0x2b5dcfbdf000 r-xp /home/willfox/local/lib/dmtcp/libdmtcp_svipc.so 0x2b5dcfddf000-0x2b5dcfde0000 rw-p /home/willfox/local/lib/dmtcp/libdmtcp_svipc.so 0x2b5dcfde0000-0x2b5dcfdf0000 r-xp /home/willfox/local/lib/dmtcp/libdmtcp_timer.so 0x2b5dcfff0000-0x2b5dcfff1000 rw-p /home/willfox/local/lib/dmtcp/libdmtcp_timer.so 0x2b5dcfff1000-0x2b5dcfff2000 rw-p 0x2b5dcfff2000-0x2b5dd006c000 r-xp /home/willfox/local/lib/dmtcp/libdmtcp.so 0x2b5dd026b000-0x2b5dd026e000 rw-p /home/willfox/local/lib/dmtcp/libdmtcp.so 0x2b5dd026e000-0x2b5dd0273000 rw-p 0x2b5dd0273000-0x2b5dd028b000 r-xp /home/willfox/local/lib/dmtcp/libdmtcp_pid.so 0x2b5dd048b000-0x2b5dd048c000 rw-p /home/willfox/local/lib/dmtcp/libdmtcp_pid.so 0x2b5dd048c000-0x2b5dd0646000 r-xp /opt/python/lib/libpython2.7.so.1.0 0x2b5dd0845000-0x2b5dd0885000 rw-p /opt/python/lib/libpython2.7.so.1.0 0x2b5dd0885000-0x2b5dd0894000 rw-p 0x2b5dd08ab000-0x2b5dd08ae000 rw-p 0x2b5dd08ae000-0x2b5dd0999000 r-xp /opt/gnu/gcc/lib64/libstdc++.so.6.0.18 0x2b5dd0b98000-0x2b5dd0ba0000 r--p /opt/gnu/gcc/lib64/libstdc++.so.6.0.18 0x2b5dd0ba0000-0x2b5dd0ba2000 rw-p /opt/gnu/gcc/lib64/libstdc++.so.6.0.18 0x2b5dd0ba2000-0x2b5dd0bb7000 rw-p 0x2b5dd0bb7000-0x2b5dd0bcc000 r-xp /opt/gnu/gcc/lib64/libgcc_s.so.1 0x2b5dd0dcc000-0x2b5dd0dcd000 rw-p /opt/gnu/gcc/lib64/libgcc_s.so.1 0x2b5dd0dcd000-0x2b5dd0dd4000 r-xp /lib64/librt-2.12.so 0x2b5dd0fd3000-0x2b5dd0fd4000 r--p /lib64/librt-2.12.so 0x2b5dd0fd4000-0x2b5dd0fd5000 rw-p /lib64/librt-2.12.so 0x2b5dd0fd5000-0x2b5dd0fe3000 rw-p 0x2b5dd120d000-0x2b5dd122d000 rw-p 0x2b5dd122d000-0x2b5dd122e000 r--p 0x2b5dd1c2e000-0x2b5dd1c2f000 r--p 0x2b5dd1c2f000-0x2b5dd1c30000 ---p 0x2b5dd1c30000-0x2b5dd1f30000 rw-p 0x2b5dd1f30000-0x2b5dd1f45000 r-xp /opt/python/lib/python2.7/lib-dynload/datetime.so 0x2b5dd2145000-0x2b5dd2149000 rw-p /opt/python/lib/python2.7/lib-dynload/datetime.so 0x2b5dd214a000-0x2b5dd218a000 rw-p 0x2b5dd21ca000-0x2b5dd224a000 rw-p 0x2b5dd228a000-0x2b5dd22ca000 rw-p 0x2b5dd2321000-0x2b5dd81b2000 r--p /usr/lib/locale/locale-archive 0x2b5dd81b3000-0x2b5dd81f3000 rw-p 0x2b5dd81f4000-0x2b5dd8234000 rw-p 0x2b5dd823c000-0x2b5dd827c000 rw-p 0x2b5dd827c000-0x2b5dd8280000 r-xp /opt/python/lib/python2.7/lib-dynload/_locale.so 0x2b5dd8480000-0x2b5dd8481000 rw-p /opt/python/lib/python2.7/lib-dynload/_locale.so 0x2b5dd8482000-0x2b5dd8582000 rw-p 0x2b5dd8681000-0x2b5dd8701000 rw-p 0x2b5dd8701000-0x2b5dd8709000 r-xp /opt/python/lib/python2.7/lib-dynload/_collections.so 0x2b5dd8908000-0x2b5dd890a000 rw-p /opt/python/lib/python2.7/lib-dynload/_collections.so 0x2b5dd890a000-0x2b5dd8912000 r-xp /opt/python/lib/python2.7/lib-dynload/operator.so 0x2b5dd8b12000-0x2b5dd8b14000 rw-p /opt/python/lib/python2.7/lib-dynload/operator.so 0x2b5dd8b15000-0x2b5dd8b20000 r-xp /opt/python/lib/python2.7/lib-dynload/itertools.so 0x2b5dd8d1f000-0x2b5dd8d24000 rw-p /opt/python/lib/python2.7/lib-dynload/itertools.so 0x2b5dd8d24000-0x2b5dd8d27000 r-xp /opt/python/lib/python2.7/lib-dynload/_heapq.so 0x2b5dd8f27000-0x2b5dd8f29000 rw-p /opt/python/lib/python2.7/lib-dynload/_heapq.so 0x2b5dd8f29000-0x2b5dd8f2f000 r-xp /opt/python/lib/python2.7/lib-dynload/strop.so 0x2b5dd912e000-0x2b5dd9130000 rw-p /opt/python/lib/python2.7/lib-dynload/strop.so 0x2b5dd9131000-0x2b5dd9134000 r-xp /opt/python/lib/python2.7/lib-dynload/_functools.so 0x2b5dd9333000-0x2b5dd9334000 rw-p /opt/python/lib/python2.7/lib-dynload/_functools.so 0x2b5dd9334000-0x2b5dd933d000 r-xp /opt/python/lib/python2.7/lib-dynload/_struct.so 0x2b5dd953d000-0x2b5dd953f000 rw-p /opt/python/lib/python2.7/lib-dynload/_struct.so 0x2b5dd9540000-0x2b5dd9580000 rw-p 0x2b5dd9581000-0x2b5dd95a3000 r-xp /opt/python/lib/python2.7/lib-dynload/_ctypes.so 0x2b5dd97a2000-0x2b5dd97a7000 rw-p /opt/python/lib/python2.7/lib-dynload/_ctypes.so 0x2b5dd97a7000-0x2b5dd97a8000 rwxp 0x2b5dd97aa000-0x2b5dd97c9000 r-xp /opt/python/lib/python2.7/lib-dynload/_io.so 0x2b5dd99c8000-0x2b5dd99d2000 rw-p /opt/python/lib/python2.7/lib-dynload/_io.so 0x2b5dd9a12000-0x2b5dd9a52000 rw-p 0x2b5dd9a52000-0x2b5dd9a5e000 r-xp /opt/python/lib/python2.7/lib-dynload/math.so 0x2b5dd9c5d000-0x2b5dd9c5f000 rw-p /opt/python/lib/python2.7/lib-dynload/math.so 0x2b5dd9c5f000-0x2b5dd9c64000 r-xp /opt/python/lib/python2.7/lib-dynload/binascii.so 0x2b5dd9e63000-0x2b5dd9e64000 rw-p /opt/python/lib/python2.7/lib-dynload/binascii.so 0x2b5dd9e64000-0x2b5dd9e7c000 r-xp /oasis/scratch/willfox/temp_project/campenv/zlib/lib/libz.so.1.2.8 0x2b5dda07b000-0x2b5dda07c000 rw-p /oasis/scratch/willfox/temp_project/campenv/zlib/lib/libz.so.1.2.8 0x2b5dda07d000-0x2b5dda082000 r-xp /opt/python/lib/python2.7/lib-dynload/_hashlib.so 0x2b5dda282000-0x2b5dda283000 rw-p /opt/python/lib/python2.7/lib-dynload/_hashlib.so 0x2b5dda283000-0x2b5dda286000 r-xp /opt/python/lib/python2.7/lib-dynload/_random.so 0x2b5dda485000-0x2b5dda486000 rw-p /opt/python/lib/python2.7/lib-dynload/_random.so 0x2b5dda486000-0x2b5dda48a000 r-xp /opt/python/lib/python2.7/lib-dynload/cStringIO.so 0x2b5dda689000-0x2b5dda68b000 rw-p /opt/python/lib/python2.7/lib-dynload/cStringIO.so 0x2b5dda68b000-0x2b5dda68e000 r-xp /opt/python/lib/python2.7/lib-dynload/fcntl.so 0x2b5dda88d000-0x2b5dda88f000 rw-p /opt/python/lib/python2.7/lib-dynload/fcntl.so 0x2b5dda88f000-0x2b5dda898000 r-xp /opt/python/lib/python2.7/lib-dynload/_json.so 0x2b5ddaa98000-0x2b5ddaa99000 rw-p /opt/python/lib/python2.7/lib-dynload/_json.so 0x2b5ddac8f000-0x2b5ddad50000 rw-p 0x2b5ddad50000-0x2b5ddad54000 r-xp /opt/python/lib/python2.7/lib-dynload/time.so 0x2b5ddaf54000-0x2b5ddaf56000 rw-p /opt/python/lib/python2.7/lib-dynload/time.so 0x2b5ddaf56000-0x2b5ddaf66000 r-xp /opt/python/lib/python2.7/lib-dynload/_socket.so 0x2b5ddb166000-0x2b5ddb16b000 rw-p /opt/python/lib/python2.7/lib-dynload/_socket.so 0x2b5ddb16b000-0x2b5ddb17f000 r-xp /opt/python/lib/python2.7/lib-dynload/_ssl.so 0x2b5ddb37f000-0x2b5ddb383000 rw-p /opt/python/lib/python2.7/lib-dynload/_ssl.so 0x2b5ddb383000-0x2b5ddb394000 r-xp /opt/python/lib/python2.7/lib-dynload/_sqlite3.so 0x2b5ddb594000-0x2b5ddb597000 rw-p /opt/python/lib/python2.7/lib-dynload/_sqlite3.so 0x2b5ddb597000-0x2b5ddb59c000 r-xp /opt/python/lib/python2.7/lib-dynload/select.so 0x2b5ddb79b000-0x2b5ddb79d000 rw-p /opt/python/lib/python2.7/lib-dynload/select.so 0x2b5ddb7de000-0x2b5ddb81e000 rw-p 0x2b5ddb85f000-0x2b5ddb8df000 rw-p 0x2b5ddb8e0000-0x2b5ddb8e6000 r-xp /opt/python/lib/python2.7/lib-dynload/_multiprocessing.so 0x2b5ddbae6000-0x2b5ddbae7000 rw-p /opt/python/lib/python2.7/lib-dynload/_multiprocessing.so 0x2b5ddbae7000-0x2b5ddbafd000 r-xp /opt/python/lib/python2.7/lib-dynload/cPickle.so 0x2b5ddbcfc000-0x2b5ddbcfe000 rw-p /opt/python/lib/python2.7/lib-dynload/cPickle.so 0x2b5ddbcfe000-0x2b5ddbd3e000 rw-p 0x2b5ddbd3e000-0x2b5ddbd48000 r-xp /opt/python/lib/python2.7/lib-dynload/array.so 0x2b5ddbf47000-0x2b5ddbf4a000 rw-p /opt/python/lib/python2.7/lib-dynload/array.so 0x2b5ddbf4b000-0x2b5ddbf4f000 rw-p 0x2b5ddbf50000-0x2b5ddbfa0000 rw-p 0x2b5ddbfa1000-0x2b5ddbfdd000 r-xp /opt/python/lib/python2.7/lib-dynload/pyexpat.so 0x2b5ddc1dc000-0x2b5ddc1e0000 rw-p /opt/python/lib/python2.7/lib-dynload/pyexpat.so 0x2b5ddc1e1000-0x2b5ddc1e3000 r-xp /opt/python/lib/python2.7/lib-dynload/grp.so 0x2b5ddc3e3000-0x2b5ddc3e4000 rw-p /opt/python/lib/python2.7/lib-dynload/grp.so 0x2b5ddc426000-0x2b5ddc4a6000 rw-p 0x2b5ddc4e6000-0x2b5ddc526000 rw-p 0x2b5ddc526000-0x2b5ddc528000 r-xp /opt/python/lib/python2.7/lib-dynload/_bisect.so 0x2b5ddc728000-0x2b5ddc729000 rw-p /opt/python/lib/python2.7/lib-dynload/_bisect.so 0x2b5ddc729000-0x2b5ddc72e000 r-xp /opt/python/lib/python2.7/lib-dynload/zlib.so 0x2b5ddc92d000-0x2b5ddc92f000 rw-p /opt/python/lib/python2.7/lib-dynload/zlib.so 0x2b5ddc930000-0x2b5ddc970000 rw-p 0x2b5ddc971000-0x2b5ddc9f1000 rw-p 0x2b5ddc9f2000-0x2b5ddc9f6000 r-xp /opt/python/lib/python2.7/lib-dynload/_lsprof.so 0x2b5ddcbf5000-0x2b5ddcbf7000 rw-p /opt/python/lib/python2.7/lib-dynload/_lsprof.so 0x2b5ddcbf8000-0x2b5ddcbfc000 r-xp /opt/python/lib/python2.7/lib-dynload/termios.so 0x2b5ddcdfc000-0x2b5ddcdfe000 rw-p /opt/python/lib/python2.7/lib-dynload/termios.so 0x2b5ddcdff000-0x2b5ddce3f000 rw-p 0x2b5ddce7f000-0x2b5ddcf40000 rw-p 0x2b5ddcfc0000-0x2b5ddd3c1000 rw-p 0x2b5ddd400000-0x2b5ddd780000 rw-p 0x2b5ddd797000-0x2b5ddd7a3000 r-xp /lib64/libnss_files-2.12.so 0x2b5ddd9a3000-0x2b5ddd9a4000 r--p /lib64/libnss_files-2.12.so 0x2b5ddd9a4000-0x2b5ddd9a5000 rw-p /lib64/libnss_files-2.12.so 0x2b5ddd9a5000-0x2b5ddd9e5000 rw-p 0x2b5ddd9e5000-0x2b5ddd9e6000 ---p 0x2b5ddd9e6000-0x2b5dde326000 rw-p 0x2b5dde327000-0x2b5ddf3a8000 rw-p 0x2b5ddf3b1000-0x2b5ddf871000 rw-p 0x2b5ddf872000-0x2b5ddf9f2000 rw-p 0x2b5ddf9f2000-0x2b5ddf9f3000 rw-s /dev/shm/sem.l2lXlV (deleted) 0x2b5ddf9f3000-0x2b5ddf9f4000 rw-s /dev/shm/sem.Lzelj3 (deleted) 0x2b5ddf9f4000-0x2b5ddf9f5000 rw-s /dev/shm/sem.DVzKgb (deleted) 0x2b5ddf9f5000-0x2b5ddf9f6000 rw-s /dev/shm/sem.9RGaej (deleted) 0x2b5ddf9f6000-0x2b5ddf9f7000 rw-s /dev/shm/sem.FanBbr (deleted) 0x2b5ddf9f7000-0x2b5ddf9f8000 rw-s /dev/shm/sem.Tjz28y (deleted) 0x2b5ddf9f8000-0x2b5ddf9f9000 rw-s /dev/shm/sem.JoUu6G (deleted) 0x2b5ddf9f9000-0x2b5ddf9fa000 rw-s /dev/shm/sem.bXHX3O (deleted) 0x2b5ddf9fa000-0x2b5ddf9fb000 rw-s /dev/shm/sem.V43q1W (deleted) 0x2b5ddf9fb000-0x2b5ddfb43000 rw-p 0x2b5ddfb5a000-0x2b5ddfb6a000 rw-p 0x2b5ddff43000-0x2b5ddff83000 rw-p 0x2b5de0000000-0x2b5de0021000 rw-p 0x2b5de0021000-0x2b5de4000000 ---p 0x7fff82884000-0x7fff83072000 rw-p [stack] -- William Fox Lawrence Berkeley National Laboratory Computational Research Division
------------------------------------------------------------------------------
_______________________________________________ Dmtcp-forum mailing list Dmtcp-forum@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dmtcp-forum