P.S., one way to turn off randomization on a per process basis is the -R flag of 'setarch'. - Gene
----- Original Message ----- From: Gene Cooperman <g...@ccs.neu.edu> To: Ankit Garg <ankit_g...@mentor.com> Cc: Gene Cooperman <g...@ccs.neu.edu>, dmtcp-forum@lists.sourceforge.net Sent: Sun, 12 Apr 2015 00:02:49 -0400 (EDT) Subject: Re: [Dmtcp-forum] Trying to use mutl-arch support having mixture of 64/32 bit executable Hi Ankit, I can report some progress. I was able to reproduce the bug on a Centos 6 computer. I would guess that the bug appears for Red Hat versions 5 and 6, and the derived distros. In addition, the DMTCP bug appears to be related to address space randomization. I turned off address space randomization: sudo bash -c ' echo 0 > /proc/sys/kernel/randomize_va_space' and the bug went away. Naturally, DMTCP is supposed to also work correctly when address space randomization is turned on. I'll continue to work on that case. In the meantime, you can use DMTCP in your app, if you're in a possition to turn off the randomization. There are also methods to turn off randomization on a per-process basis, but that would be painful if you have to do it for each process. Best wishes, - Gene ----- Original Message ----- From: Ankit Garg <ankit_g...@mentor.com> To: Gene Cooperman <g...@ccs.neu.edu> Cc: dmtcp-forum@lists.sourceforge.net Sent: Fri, 10 Apr 2015 08:45:39 -0400 (EDT) Subject: Re: [Dmtcp-forum] Trying to use mutl-arch support having mixture of 64/32 bit executable Here is the output DMTCP version: 2.4.0-rc1 Date built: Fri Apr 10 18:14:45 IST 2015 config.log: ./configure --prefix=/in/TBX_SANDBOXES/agarg/dmtcp/tbx_dev/src/mct/dmtcp/dmtcp_install --enable-realtime-ckpt-signal Description: Red Hat Enterprise Linux Client release 5.8 (Tikanga) Codename: Tikanga Linux inndt232 2.6.18-308.el5 #1 SMP Fri Jan 27 17:17:51 EST 2012 x86_64 x86_64 x86_64 GNU/Linux Compiler: gcc Using built-in specs. Target: x86_64-unknown-linux-gnu Configured with: /y/jon/gcc/gcc412/gcc-4.1.2/configure --prefix=/tools/linux64/gcc-4.1.2 Thread model: posix gcc version 4.1.2 CFLAGS: -g -O0 CXXFLAGS: -g -O2 CPPFLAGS: LDFLAGS: Error: could not find libjava.so Error: could not find Java 2 Runtime Environment. lrwxrwxrwx 1 root root 11 Mar 25 2014 /lib64/libc.so.6 -> libc-2.5.so lrwxrwxrwx 1 root root 11 Mar 25 2014 /lib/libc.so.6 -> libc-2.5.so -rw------- 1 root root 217016 Apr 10 18:13 /var/db/nscd/group -rw------- 1 root root 217016 Apr 10 18:14 /var/db/nscd/hosts -rw------- 1 root root 1801163 Apr 10 18:14 /var/db/nscd/passwd -rw-r--r-- 1 root root 5 Feb 22 14:46 /var/run/nscd/nscd.pid srw-rw-rw- 1 root root 0 Feb 22 14:46 /var/run/nscd/socket On 04/10/2015 06:12 PM, Gene Cooperman wrote: > Hi Ankit, > Interesting. So, the issue seems to be what environmnet. > Could you do: > make display-build-env > in the root directory of DMTCP, and send me the output? > I'll see if I can locate a similar environment to the > one that you're using. If not, is there some possibility > of getting a guest account, or even doing a screen-sharing > session together while talking on the phone? > > Best, > - Gene > > ----- Original Message ----- > From: Ankit Garg <ankit_g...@mentor.com> > To: Gene Cooperman <g...@ccs.neu.edu> > Cc: dmtcp-forum@lists.sourceforge.net > Sent: Fri, 10 Apr 2015 06:57:47 -0400 (EDT) > Subject: Re: [Dmtcp-forum] Trying to use mutl-arch support having mixture of > 64/32 bit executable > > Hi Gene, > Unfortunately this script is not working for me and showing > the same seg-fault. Here the screen logs > > > dmtcp_coordinator starting... > Host: inndt232 (137.202.214.37) > Port: 7779 > Checkpoint Interval: 6 > Exit on last client: 1 > Backgrounding... > [40000] NOTE at socketconnlist.cpp:156 in scanForPreExisting; > REASON='found pre-existing socket... will not be restored' > fd = 11 > device = socket:[418384] > [40000] WARNING at socketconnection.cpp:192 in TcpConnection; > REASON='JWARNING((domain == AF_INET || domain == AF_UNIX || domain == > AF_INET6) && (type & 077) == SOCK_STREAM) failed' > domain = 0 > type = 0 > protocol = 0 > [40000] NOTE at socketconnlist.cpp:156 in scanForPreExisting; > REASON='found pre-existing socket... will not be restored' > fd = 12 > device = socket:[418386] > [40000] WARNING at socketconnection.cpp:192 in TcpConnection; > REASON='JWARNING((domain == AF_INET || domain == AF_UNIX || domain == > AF_INET6) && (type & 077) == SOCK_STREAM) failed' > domain = 0 > type = 0 > protocol = 0 > [40000] NOTE at socketconnlist.cpp:156 in scanForPreExisting; > REASON='found pre-existing socket... will not be restored' > fd = 13 > device = socket:[418480] > [40000] WARNING at socketconnection.cpp:192 in TcpConnection; > REASON='JWARNING((domain == AF_INET || domain == AF_UNIX || domain == > AF_INET6) && (type & 077) == SOCK_STREAM) failed' > domain = 0 > type = 0 > protocol = 0 > [40000] NOTE at socketconnlist.cpp:156 in scanForPreExisting; > REASON='found pre-existing socket... will not be restored' > fd = 14 > device = socket:[453279] > [40000] WARNING at socketconnection.cpp:192 in TcpConnection; > REASON='JWARNING((domain == AF_INET || domain == AF_UNIX || domain == > AF_INET6) && (type & 077) == SOCK_STREAM) failed' > domain = 0 > type = 0 > protocol = 0 > 1 2 3 4 5 6 7 ./multi-arch.sh: line 21: 9319 Killed > $dir/dmtcp-multi-arch-build/bin/dmtcp_launch -i6 ./a.out > dmtcp_coordinator starting... > Host: inndt232 (137.202.214.37) > Port: 7779 > Checkpoint Interval: disabled (checkpoint manually instead) > Exit on last client: 1 > Backgrounding... > *[9344] mtcp_restart.c:607 unmap_memory_areas_and_restore_vdso: > ***WARNING: munmap(0xffffe000, 4096) failed: 22 > ./multi-arch.sh: line 22: 9344 Segmentation fault (core dumped) > $dir/dmtcp-multi-arch-build/bin/dmtcp_restart ckpt_a.out_*.dmtcp > **** The restarted application should again be printing. ***** > > > Regards > Ankit > > > > On 04/10/2015 04:19 PM, Gene Cooperman wrote: >> Hi Ankit, >> Actually, I just tried again to reproduce the bug and failed this time. >> So, I'm still looking. >> Would you mind executing the attached script multi-arch.sh , just to >> double-check that we're seeing the same thing? This script succeeds for me. >> (Please do scan the script beforehand, so that you know what you'll >> be executing.) >> I'll have to turn to something else right now, but by the end of >> today or the weekend, I can take another look. >> >> Thanks, >> - Gene >> >> >> On Fri, Apr 10, 2015 at 04:08:53PM +0530, Ankit Garg wrote: >>> Hi Gene, >>> I pulled the git repository https://github.com/dmtcp/dmtcp.git. >>> >>> dmtcp_launch --version >>> dmtcp_launch (DMTCP) 2.4.0-rc1 >>> >>> Just for the sake of understanding, could you explain the bug. Is it >>> related to stack restoration during restart ? >>> >>> Thanks for prompt reply. >>> >>> Regards >>> Ankit >>> >>> >>> >>> On 04/10/2015 04:02 PM, Gene Cooperman wrote: >>>> Hi Ankit, >>>> I can confirm this bug on the development branch of DMTCP from the >>>> github repo. Thanks very much for reporting this. I'm hoping to >>>> report back with a bug fix soon, but please don't hesitate to write >>>> for the status. >>>> Also, just for completeness, which version of DMTCP were you using? >>>> >>>> Thanks, >>>> - Gene >>>> >>>> >>>> On Fri, Apr 10, 2015 at 01:01:40PM +0530, Ankit Garg wrote: >>>>> Hi Gene, >>>>> Thanks for the reply. GDB is showing following stack on >>>>> seg-fault >>>>> >>>>> #0 0x558f23a2 in ?? () >>>>> #1 0x558f6d68 in ?? () >>>>> #2 0x558f6a74 in ?? () >>>>> #3 0x0804c525 in restart_fast_path () at mtcp_restart.c:454 >>>>> >>>>> Above line is >>>>> >>>>> *rinfo.restorememoryareas_fptr(&rinfo);* >>>>> >>>>> within function *restart_fast_path* >>>>> >>>>> >>>>> Following is the list of directories for the installed area >>>>> >>>>> .: >>>>> total 28 >>>>> drwxr-xr-x 6 agarg medrd 4096 Apr 10 12:44 . >>>>> drwxr-xr-x 18 agarg medrd 8192 Apr 10 12:48 .. >>>>> drwxr-xr-x 2 agarg medrd 4096 Apr 10 12:44 bin >>>>> drwxr-xr-x 2 agarg medrd 4096 Apr 10 12:44 include >>>>> drwxr-xr-x 3 agarg medrd 4096 Apr 10 12:43 lib >>>>> drwxr-xr-x 4 agarg medrd 4096 Apr 10 12:44 share >>>>> >>>>> ./bin: >>>>> total 9400 >>>>> drwxr-xr-x 2 agarg medrd 4096 Apr 10 12:44 . >>>>> drwxr-xr-x 6 agarg medrd 4096 Apr 10 12:44 .. >>>>> -rwxr-xr-x 1 agarg medrd 2095273 Apr 10 12:44 dmtcp_command >>>>> -rwxr-xr-x 1 agarg medrd 1765945 Apr 10 12:44 dmtcp_coordinator >>>>> -rwxr-xr-x 1 agarg medrd 712759 Apr 10 12:44 dmtcp_discover_rm >>>>> -rwxr-xr-x 1 agarg medrd 2245864 Apr 10 12:44 dmtcp_launch >>>>> -rwxr-xr-x 1 agarg medrd 10101 Apr 10 12:44 dmtcp_nocheckpoint >>>>> -rwxr-xr-x 1 agarg medrd 2461225 Apr 10 12:44 dmtcp_restart >>>>> -rwxr-xr-x 1 agarg medrd 5102 Apr 10 12:44 dmtcp_rm_loclaunch >>>>> -rwxr-xr-x 1 agarg medrd 77645 Apr 10 12:44 dmtcp_srun_helper >>>>> -rwxr-xr-x 1 agarg medrd 37119 Apr 10 12:44 dmtcp_ssh >>>>> -rwxr-xr-x 1 agarg medrd 34125 Apr 10 12:44 dmtcp_sshd >>>>> -rwxr-xr-x 1 agarg medrd 97427 Apr 10 12:44 mtcp_restart >>>>> >>>>> ./include: >>>>> total 24 >>>>> drwxr-xr-x 2 agarg medrd 4096 Apr 10 12:44 . >>>>> drwxr-xr-x 6 agarg medrd 4096 Apr 10 12:44 .. >>>>> -rw-r--r-- 1 agarg medrd 13123 Apr 10 12:44 dmtcp.h >>>>> >>>>> ./lib: >>>>> total 12 >>>>> drwxr-xr-x 3 agarg medrd 4096 Apr 10 12:43 . >>>>> drwxr-xr-x 6 agarg medrd 4096 Apr 10 12:44 .. >>>>> drwxr-xr-x 3 agarg medrd 4096 Apr 10 12:44 dmtcp >>>>> >>>>> ./lib/dmtcp: >>>>> total 9164 >>>>> drwxr-xr-x 3 agarg medrd 4096 Apr 10 12:44 . >>>>> drwxr-xr-x 3 agarg medrd 4096 Apr 10 12:43 .. >>>>> drwxr-xr-x 4 agarg medrd 4096 Apr 10 12:43 32 >>>>> -rwxr-xr-x 1 agarg medrd 17585 Apr 10 12:44 libdmtcp_alloc.so >>>>> -rwxr-xr-x 1 agarg medrd 613809 Apr 10 12:44 libdmtcp_batch-queue.so >>>>> -rwxr-xr-x 1 agarg medrd 11066 Apr 10 12:44 libdmtcp_dl.so >>>>> -rwxr-xr-x 1 agarg medrd 3488068 Apr 10 12:44 libdmtcp_ipc.so >>>>> -rwxr-xr-x 1 agarg medrd 17010 Apr 10 12:44 libdmtcp_modify-env.so >>>>> -rwxr-xr-x 1 agarg medrd 606468 Apr 10 12:44 libdmtcp_pid.so >>>>> -rwxr-xr-x 1 agarg medrd 359897 Apr 10 12:44 libdmtcp_ptrace.so >>>>> -rwxr-xr-x 1 agarg medrd 3545880 Apr 10 12:44 libdmtcp.so >>>>> -rwxr-xr-x 1 agarg medrd 492260 Apr 10 12:44 libdmtcp_timer.so >>>>> -rwxr-xr-x 1 agarg medrd 163061 Apr 10 12:44 libdmtcp_unique-ckpt.so >>>>> >>>>> ./lib/dmtcp/32: >>>>> total 16 >>>>> drwxr-xr-x 4 agarg medrd 4096 Apr 10 12:43 . >>>>> drwxr-xr-x 3 agarg medrd 4096 Apr 10 12:44 .. >>>>> drwxr-xr-x 2 agarg medrd 4096 Apr 10 12:43 bin >>>>> drwxr-xr-x 3 agarg medrd 4096 Apr 10 12:43 lib >>>>> >>>>> ./lib/dmtcp/32/bin: >>>>> total 84 >>>>> drwxr-xr-x 2 agarg medrd 4096 Apr 10 12:43 . >>>>> drwxr-xr-x 4 agarg medrd 4096 Apr 10 12:43 .. >>>>> -rwxr-xr-x 1 agarg medrd 72049 Apr 10 12:43 mtcp_restart-32 >>>>> >>>>> ./lib/dmtcp/32/lib: >>>>> total 12 >>>>> drwxr-xr-x 3 agarg medrd 4096 Apr 10 12:43 . >>>>> drwxr-xr-x 4 agarg medrd 4096 Apr 10 12:43 .. >>>>> drwxr-xr-x 2 agarg medrd 4096 Apr 10 12:43 dmtcp >>>>> >>>>> ./lib/dmtcp/32/lib/dmtcp: >>>>> total 8192 >>>>> drwxr-xr-x 2 agarg medrd 4096 Apr 10 12:43 . >>>>> drwxr-xr-x 3 agarg medrd 4096 Apr 10 12:43 .. >>>>> -rwxr-xr-x 1 agarg medrd 13682 Apr 10 12:43 libdmtcp_alloc.so >>>>> -rwxr-xr-x 1 agarg medrd 541615 Apr 10 12:43 libdmtcp_batch-queue.so >>>>> -rwxr-xr-x 1 agarg medrd 8405 Apr 10 12:43 libdmtcp_dl.so >>>>> -rwxr-xr-x 1 agarg medrd 3122259 Apr 10 12:43 libdmtcp_ipc.so >>>>> -rwxr-xr-x 1 agarg medrd 13585 Apr 10 12:43 libdmtcp_modify-env.so >>>>> -rwxr-xr-x 1 agarg medrd 531681 Apr 10 12:43 libdmtcp_pid.so >>>>> -rwxr-xr-x 1 agarg medrd 327213 Apr 10 12:43 libdmtcp_ptrace.so >>>>> -rwxr-xr-x 1 agarg medrd 3164514 Apr 10 12:43 libdmtcp.so >>>>> -rwxr-xr-x 1 agarg medrd 453328 Apr 10 12:43 libdmtcp_timer.so >>>>> -rwxr-xr-x 1 agarg medrd 144674 Apr 10 12:43 libdmtcp_unique-ckpt.so >>>>> >>>>> ./share: >>>>> total 16 >>>>> drwxr-xr-x 4 agarg medrd 4096 Apr 10 12:44 . >>>>> drwxr-xr-x 6 agarg medrd 4096 Apr 10 12:44 .. >>>>> drwxr-xr-x 3 agarg medrd 4096 Apr 10 12:44 doc >>>>> drwxr-xr-x 3 agarg medrd 4096 Apr 10 12:44 man >>>>> >>>>> ./share/doc: >>>>> total 12 >>>>> drwxr-xr-x 3 agarg medrd 4096 Apr 10 12:44 . >>>>> drwxr-xr-x 4 agarg medrd 4096 Apr 10 12:44 .. >>>>> drwxr-xr-x 2 agarg medrd 4096 Apr 10 12:44 dmtcp-2.4.0-rc1 >>>>> >>>>> ./share/doc/dmtcp-2.4.0-rc1: >>>>> total 64 >>>>> drwxr-xr-x 2 agarg medrd 4096 Apr 10 12:44 . >>>>> drwxr-xr-x 3 agarg medrd 4096 Apr 10 12:44 .. >>>>> -rw-r--r-- 1 agarg medrd 2754 Apr 10 12:44 AUTHORS >>>>> -rw-r--r-- 1 agarg medrd 2280 Apr 10 12:44 COPYING >>>>> -rw-r--r-- 1 agarg medrd 26797 Apr 10 12:44 NEWS >>>>> -rw-r--r-- 1 agarg medrd 20237 Apr 10 12:44 QUICK-START >>>>> >>>>> ./share/man: >>>>> total 12 >>>>> drwxr-xr-x 3 agarg medrd 4096 Apr 10 12:44 . >>>>> drwxr-xr-x 4 agarg medrd 4096 Apr 10 12:44 .. >>>>> drwxr-xr-x 2 agarg medrd 4096 Apr 10 12:44 man1 >>>>> >>>>> ./share/man/man1: >>>>> total 28 >>>>> drwxr-xr-x 2 agarg medrd 4096 Apr 10 12:44 . >>>>> drwxr-xr-x 3 agarg medrd 4096 Apr 10 12:44 .. >>>>> -rw-r--r-- 1 agarg medrd 3556 Apr 10 12:44 dmtcp.1.gz >>>>> -rw-r--r-- 1 agarg medrd 1026 Apr 10 12:44 dmtcp_command.1.gz >>>>> -rw-r--r-- 1 agarg medrd 1280 Apr 10 12:44 dmtcp_coordinator.1.gz >>>>> lrwxrwxrwx 1 agarg medrd 10 Apr 10 12:44 dmtcp_discover_rm.1.gz -> >>>>> dmtcp.1.gz >>>>> -rw-r--r-- 1 agarg medrd 2093 Apr 10 12:44 dmtcp_launch.1.gz >>>>> lrwxrwxrwx 1 agarg medrd 10 Apr 10 12:44 dmtcp_nocheckpoint.1.gz >>>>> -> dmtcp.1.gz >>>>> -rw-r--r-- 1 agarg medrd 1571 Apr 10 12:44 dmtcp_restart.1.gz >>>>> lrwxrwxrwx 1 agarg medrd 10 Apr 10 12:44 dmtcp_rm_loclaunch.1.gz >>>>> -> dmtcp.1.gz >>>>> lrwxrwxrwx 1 agarg medrd 10 Apr 10 12:44 dmtcp_ssh.1.gz -> dmtcp.1.gz >>>>> lrwxrwxrwx 1 agarg medrd 10 Apr 10 12:44 dmtcp_sshd.1.gz -> dmtcp.1.gz >>>>> lrwxrwxrwx 1 agarg medrd 10 Apr 10 12:44 mtcp_restart.1.gz -> dmtcp.1.gz >>>>> >>>>> >>>>> Regarding sharing the session, I have to check my organization >>>>> policies and soon get back to you. >>>>> >>>>> Thanks for your help >>>>> >>>>> Regards >>>>> Ankit >>>>> >>>>> >>>>> >>>>> On 04/09/2015 10:40 PM, Gene Cooperman wrote: >>>>>> Hi Ankit, >>>>>> Could you confirm for us what files are in your install directory >>>>>> (especially in bin and lib)? >>>>>> Also, are you able to run GDB on the core dump to see the stack? >>>>>> Or alternatively, you could simply try: >>>>>> gdb --args dmtcp_restart ckpt_*.dmtcp >>>>>> (gdb) b execvp >>>>>> (gdb) r >>>>>> (gdb) b main >>>>>> (gdb) c >>>>>> [ Not all those gdb commands are necessary, but I like to go slow and >>>>>> check each step. ] >>>>>> >>>>>> Finally, if you can set this up in a VM or on some outside machine where >>>>>> we could get an account, then we'd be happy to login and check directly >>>>>> what is happening. >>>>>> >>>>>> Best wishes, >>>>>> - Gene >>>>>> >>>>>> >>>>>> On Thu, Apr 09, 2015 at 04:00:08PM +0530, Ankit Garg wrote: >>>>>>> Hi, >>>>>>> I followed the instructions at >>>>>>> https://github.com/dmtcp/dmtcp/blob/master/doc/multi-arch.txt to build >>>>>>> the package for multi architecture . Now I am trying to >>>>>>> checkpoint/restart a 32-bit binary using the same package. On Restart, >>>>>>> it crashes with the following error. >>>>>>> >>>>>>> % g++ test.cxx -m32 >>>>>>> % dmtcp_launch ./a.out >>>>>>> >>>>>>> Took checkpoint from the shell where dmtcp_coordinator is invoked >>>>>>> >>>>>>> then >>>>>>> >>>>>>> % dmtcp_restart ckpt* >>>>>>> >>>>>>> >>>>>>> [26734] mtcp_restart.c:607 unmap_memory_areas_and_restore_vdso: >>>>>>> ***WARNING: munmap(0xffffe000, 4096) failed: 22 >>>>>>> Segmentation fault (core dumped) >>>>>>> >>>>>>> >>>>>>> Can anyone help on this. I am using latest release pulled from git. >>>>>>> >>>>>>> >>>>>>> Regards >>>>>>> Ankit >>>>>>> >>>>>>> >>>>>>> >>>>>>> ------------------------------------------------------------------------------ >>>>>>> BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT >>>>>>> Develop your own process in accordance with the BPMN 2 standard >>>>>>> Learn Process modeling best practices with Bonita BPM through live >>>>>>> exercises >>>>>>> http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- >>>>>>> event?utm_ >>>>>>> source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF >>>>>>> _______________________________________________ >>>>>>> Dmtcp-forum mailing list >>>>>>> Dmtcp-forum@lists.sourceforge.net >>>>>>> https://lists.sourceforge.net/lists/listinfo/dmtcp-forum > > ------------------------------------------------------------------------------ BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT Develop your own process in accordance with the BPMN 2 standard Learn Process modeling best practices with Bonita BPM through live exercises http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_ source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF _______________________________________________ Dmtcp-forum mailing list Dmtcp-forum@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dmtcp-forum