Hello, I am encountering an issue using DMTCP to checkpoint/restart my program. Everything seems good when first restarted, until my program has consumed the memory on what was the pre-checkpointed heap, and the next call to new() or malloc() which needs to expand the heap causes the program to crash.
For example, immediately prior to checkpointing the /proc/<pid>/maps shows the heap at: 03606000-037f8000 rwxp 00000000 00:00 0 [heap] On restore (on dmtcp_checkpoint() == DMTCP_AFTER_RESTART), the same mapping entry appears, but without the “[heap]” name: 03606000-037f8000 rwxp 00000000 00:00 0 And once a “new string(“Hello”))” is called where the expected memory returned spans the address 0x37f8000, my program crashes with a segmentation fault. #0 sysmalloc (nb=nb@entry=48, av=av@entry=0x7fd76d2b0c40 <main_arena>) at malloc.c:2723 #1 0x00007fd76d1782c9 in _int_malloc (av=av@entry=0x7fd76d2b0c40 <main_arena>, bytes=bytes@entry=32) at malloc.c:4133 #2 0x00007fd76d17956a in __GI___libc_malloc (bytes=32) at malloc.c:3057 #3 0x00007fd7719b8208 in malloc (size=32) at alloc/mallocwrappers.cpp:41 #4 0x00007fd76d4e7fd8 in operator new(unsigned long) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 #5 0x00000000014fe6b8 in doMemoryStuff (msg=0x7fd76a5b00b0 "Hello") at src/mucking/mystuff.cpp Looking at DMTCP’s code path when restarting my checkpoint, I find that src/mtcp/mtcp_restart.c’s restore_brk(..) function exits without trying to call mtcp_sys_brk() because saved_brk < current_brk. The later attempt to call mtcp_sys_brk() in restorememoryareas() does not fail, in that it does not return error code -1, but if I insert a print statement of the subsequent result of mtcp_sys_brk(NULL), it shows that the value of the break has not changed to the saved_brk value. Then in read_one_memory_area() when it is trying to read in the “[heap]” area the warning WARNING: break (0x555555565000) not equal to end of heap (0x37f8000) is emitted. Checkpointing with either dmtcp_launch options --disable-alloc-plugin or --disable-all-plugins does not seem to change the behaviour. Do you have any suggestions on how I could solve or circumvent this issue? dmtcp version: 3.0.0 master branch at 114f5d59961b6b3e178629b961ce58be17282403 uname info: Linux machine-1 4.19.0-9-cloud-amd64 #1 SMP Debian 4.19.118-2+deb10u1 (2020-06-07) x86_64 GNU/Linux Thanks, Christine
_______________________________________________ Dmtcp-forum mailing list Dmtcp-forum@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dmtcp-forum