Hi Rohan,

When I run the tests on the Github trunk, all the tests fail: the same as 
running them on the latest release dmtcp-2.3.1. This is different from your 
observations that only some tests are failing.

Maybe the cause is that I am running OpenSUSE in a virtual machine 
(VirtualBox). I also have one real Linux machine at my disposal, which runs 
SUSE Linux Enterprise Server 11 (x86_64), kernel 3.0.76. On that machine DMTCP 
works without any problems. When I run a program on my virtual machine, 
checkpoint it, and then restart the checkpoint on the real Linux machine, it 
does work!  In the other direction it doesn't seem to work, so apparently 
something goes wrong when writing the checkpoint on a virtual machine. 

Is DMTCP supposed to work on a virtualized machine? 

Kind regards,
Wijnand

-----Original Message-----
From: Rohan Garg [mailto:rohg...@ccs.neu.edu] 
Sent: Monday, February 23, 2015 4:06 PM
To: Wijnand SUIJLEN
Cc: dmtcp-forum
Subject: Re: [Dmtcp-forum] dmtcp_restart (dmtcp 2.3.1) won't restart from 
checkpoint on OpenSUSE 13.2

Hi Wijnand,

I have been trying to reproduce this issue on my setup. I have tried this with 
DMTCP-2.3.1 and the latest DMTCP from the Github trunk 
(github.com/dmtcp/dmtcp).  Most of the `make check` tests pass in both the 
cases, but some tests are failing. As you said, it could be a kernel issue; I 
have started looking into this. Here are the details:

$ lsb_release -a
LSB Version:    n/a
Distributor ID: openSUSE project
Description:    openSUSE 13.2 (Harlequin) (x86_64)
Release:        13.2
Codename:       Harlequin
$ uname -a
Linux linux.site 3.16.7-7-desktop #1 SMP PREEMPT Wed Dec 17 18:00:44 UTC 2014 
(762f27a) x86_64 x86_64 x86_64 GNU/Linux

In the meantime, could you please try the DMTCP trunk from Github and check if 
the issue persists? Also, try running the tests using
`DMTCP_GZIP=0 make check`.  I’ve noticed some kernel lockups when using gzip.

Thanks,
Rohan

> On Feb 23, 2015, at 5:47 AM, Wijnand SUIJLEN <wijnand.suij...@huawei.com> 
> wrote:
> 
> Hi Rohan,
> 
> Thanks for looking into this. I did what you wrote: Downloaded the source 
> tarball and did a "./configure && make && make check-dmtcp1". Indeed, the 
> test fails. For completeness sake, I attached to this e-mail (build.log) a 
> log of a complete "./configure && make && make check". All tests are failing. 
> 
> The other attachment (testdir-contents) is a directory listing of the 
> location where the test scripts copies the checkpoints to. To prove 
> that the checkpoints files are not all-zeros, below are the first 10 
> lines of a hexdump of the dmtcp1 checkpoint
> 
> wijnand@linux-6ea7:~/bla> hexdump -C 
> /tmp/dmtcp-wijn...@linux-6ea7.site/dmtcp-autotest-845315162/ckpt_dmtcp
> 1_1b03894b1c5e0f7-40000-54e6847e.dmtcp | head -n 10
> 00000000  1f 8b 08 00 7e 84 e6 54  04 03 ec dd 7d 6c 5d 67  
> |....~..T....}l]g|
> 00000010  7d 07 f0 63 a7 49 4c db  39 e6 65 9d 29 2f 75 a1  
> |}..c.IL.9.e.)/u.|
> 00000020  65 06 c9 76 ec bc 60 82  18 76 9b b4 37 5b 80 ac  
> |e..v..`..v..7[..|
> 00000030  24 23 ac 4b 6d 27 76 70  da c4 31 f5 0d 35 94 d1  
> |$#.Km'vp..1..5..|
> 00000040  8c b4 2c 16 2b 64 13 db  bc a9 62 a1 2a 34 9b d0  
> |..,.+d....b.*4..|
> 00000050  94 49 fb 23 e2 0f 9a ae  94 04 6d 62 65 05 2d 7b  
> |.I.#......mbe.-{|
> 00000060  11 ca 50 db 39 65 48 45  bc 79 08 c8 ce 73 ce f3  
> |..P.9eHE.y...s..|
> 00000070  a4 f6 95 9d bb 21 9a 32  fa 39 d5 bd df f3 7b 9e  
> |.....!.2.9....{.|
> 00000080  e7 bc 7d ee 71 ee cb b9  76 d7 bf 6d cb f5 9b 07  
> |..}.q...v..m....|
> 00000090  ae af 6c b8 fe 37 36 bf  63 e3 db b7 0c 6c 7c 5b  
> |..l..76.c....l|[|
> 
> 
> I can imagine that the error occurs because of a recent change in the Linux 
> kernel. Therefore, here is the output of uname on my system:
> wijnand@linux-6ea7:~/bla> uname -a
> Linux linux-6ea7.site 3.16.7-7-desktop #1 SMP PREEMPT Wed Dec 17 
> 18:00:44 UTC 2014 (762f27a) x86_64 x86_64 x86_64 GNU/Linux
> 
> Kind regards,
> Wijnand Suijlen
> 
> 
> -----Original Message-----
> From: Rohan Garg [mailto:rohg...@ccs.neu.edu]
> Sent: Friday, February 20, 2015 5:57 PM
> To: Wijnand SUIJLEN
> Cc: dmtcp-forum
> Subject: Re: [Dmtcp-forum] dmtcp_restart (dmtcp 2.3.1) won't restart 
> from checkpoint on OpenSUSE 13.2
> 
> Hi Wijnand,
> 
> Could you please try the following steps?
> 
> 1) Download the source tarball from: 
> http://sourceforge.net/projects/dmtcp/files/dmtcp-2.x/2.3.1/
> 2) ./configure && make && make check-dmtcp1
> 3) Verify that the test passes
> 
> Also, could you please verify that the checkpoint image is of non-zero size? 
> It could be that dmtcp_launch is failing to create a valid checkpoint image.
> 
> Thanks,
> Rohan
> 
> ----- Original Message -----
> From: "Kapil Arya" <kapil.arya...@gmail.com>
> To: "Wijnand SUIJLEN" <wijnand.suij...@huawei.com>
> Cc: "dmtcp-forum" <dmtcp-forum@lists.sourceforge.net>
> Sent: Friday, February 20, 2015 11:04:04 AM
> Subject: Re: [Dmtcp-forum] dmtcp_restart (dmtcp 2.3.1) won't restart 
> from checkpoint on OpenSUSE 13.2
> 
> Rohan/Jiajun,
> 
> Can you take a look at it? 
> 
> Best,
> Kapil
> 
> On Thu, Feb 19, 2015 at 11:40 AM, Wijnand SUIJLEN < 
> wijnand.suij...@huawei.com > wrote: 
> 
> 
> Hi,
> 
> I am running OpenSUSE 13.2 (64-bit) inside VirtualBox and I am trying to get 
> DMTCP 2.3.1 to work on the simple example 'dmtcp1.c' as supplied in the 
> source tar.gz distribution of the package (dmtcp-2.3.1/test/dmtcp1.c). There 
> are no complaints from DMTCP when writing the checkpoint. However, the 
> dmtcp_restart fails to restart the program with the error message "only read 
> 0 bytes instead of 4096 from checkpoint file". 
> 
> What am I doing wrong and what should I do to make it work? 
> 
> Some details: 
> I have tried it with an installation as built directly from the source 
> distribution and I tried it with the binary distribution as supplied by 
> OpenSUSE 13.2: It doesn't make any difference. 
> http://download.opensuse.org/distribution/13.2/repo/oss/suse/x86_64/dm
> tcp-2.3.1-2.2.2.x86_64.rpm 
> http://download.opensuse.org/distribution/13.2/repo/oss/suse/x86_64/dm
> tcp-devel-2.3.1-2.2.2.x86_64.rpm
> 
> --- start console log ---
> wijnand@linux-6ea7:~/bla/dmtcp-2.3.1/test> gcc -fPIC -o dmtcp1 
> dmtcp1.c wijnand@linux-6ea7:~/bla/dmtcp-2.3.1/test> dmtcp_checkpoint 
> ./dmtcp1
> 1 2 3 4 5 6 7 8 9 ^C
> wijnand@linux-6ea7:~/bla/dmtcp-2.3.1/test> dmtcp_restart 
> ckpt_dmtcp1_1b03894b1c5e0f7-40000-54e602b1.dmtcp
> [27865] mtcp_util.ic:235 mtcp_readfile: 
> only read 0 bytes instead of 4096 from checkpoint file [27865] 
> mtcp_util.ic:235 mtcp_readfile: 
> only read 0 bytes instead of 4096 from checkpoint file [27865] 
> mtcp_util.ic:235 mtcp_readfile: 
> only read 0 bytes instead of 4096 from checkpoint file [27865] 
> mtcp_util.ic:235 mtcp_readfile: 
> only read 0 bytes instead of 4096 from checkpoint file [27865] 
> mtcp_util.ic:235 mtcp_readfile: 
> only read 0 bytes instead of 4096 from checkpoint file [27865] 
> mtcp_util.ic:235 mtcp_readfile: 
> only read 0 bytes instead of 4096 from checkpoint file [27865] 
> mtcp_util.ic:235 mtcp_readfile: 
> only read 0 bytes instead of 4096 from checkpoint file [27865] 
> mtcp_util.ic:235 mtcp_readfile: 
> only read 0 bytes instead of 4096 from checkpoint file [27865] 
> mtcp_util.ic:235 mtcp_readfile: 
> only read 0 bytes instead of 4096 from checkpoint file [27865] 
> mtcp_util.ic:235 mtcp_readfile: 
> only read 0 bytes instead of 4096 from checkpoint file [27865] 
> mtcp_util.ic:235 mtcp_readfile: 
> only read 0 bytes instead of 4096 from checkpoint file [27865] 
> mtcp_util.ic:237 mtcp_readfile: 
> failed to read after 10 tries in a row. 
> Segmentation fault
> wijnand@linux-6ea7:~/bla/dmtcp-2.3.1/test>
> --- end console log ---
> 
> --- start dmtcp_coordinator log ---
> wijnand@linux-6ea7:~/bla/dmtcp-2.3.1/test> dmtcp_coordinator 
> dmtcp_coordinator (DMTCP) 2.3.1 License LGPLv3+: GNU LGPL version 3 or later 
> < http://gnu.org/licenses/lgpl.html >. 
> This program comes with ABSOLUTELY NO WARRANTY. 
> This is free software, and you are welcome to redistribute it under certain 
> conditions; see COPYING file for details. 
> (Use flag "-q" to hide this message.)
> 
> dmtcp_coordinator starting... 
> Host: linux-6ea7.site (0.0.0.0)
> Port: 7779
> Checkpoint Interval: disabled (checkpoint manually instead) Exit on last 
> client: 0 Type '?' for help. 
> 
> [27848] NOTE at dmtcp_coordinator.cpp:1040 in onConnect; REASON='worker 
> connected' 
> hello_remote.from = 1b03894b1c5e0f7-27851-54e602b1 [27848] NOTE at 
> dmtcp_coordinator.cpp:825 in onData; REASON='Updating process Information 
> after exec()' 
> progname = dmtcp1
> msg.from = 1b03894b1c5e0f7-40000-54e602b1
> client->identity() = 1b03894b1c5e0f7-27851-54e602b1
> c
> [27848] NOTE at dmtcp_coordinator.cpp:1271 in startCheckpoint; 
> REASON='starting checkpoint, suspending all nodes' 
> s.numPeers = 1
> [27848] NOTE at dmtcp_coordinator.cpp:1273 in startCheckpoint; 
> REASON='Incremented Generation' 
> compId.generation() = 1
> [27848] NOTE at dmtcp_coordinator.cpp:615 in updateMinimumState; 
> REASON='locking all nodes' 
> [27848] NOTE at dmtcp_coordinator.cpp:621 in updateMinimumState; 
> REASON='draining all nodes' 
> [27848] NOTE at dmtcp_coordinator.cpp:627 in updateMinimumState; 
> REASON='checkpointing all nodes' 
> [27848] NOTE at dmtcp_coordinator.cpp:641 in updateMinimumState; 
> REASON='building name service database' 
> [27848] NOTE at dmtcp_coordinator.cpp:657 in updateMinimumState; 
> REASON='entertaining queries now' 
> [27848] NOTE at dmtcp_coordinator.cpp:662 in updateMinimumState; 
> REASON='refilling all nodes' 
> [27848] NOTE at dmtcp_coordinator.cpp:693 in updateMinimumState; 
> REASON='restarting all nodes' 
> [27848] NOTE at dmtcp_coordinator.cpp:875 in onDisconnect; REASON='client 
> disconnected' 
> client->identity() = 1b03894b1c5e0f7-40000-54e602b1
> [27848] NOTE at dmtcp_coordinator.cpp:1096 in 
> validateRestartingWorkerProcess; REASON='FIRST dmtcp_restart connection. Set 
> numPeers. Generate timestamp' 
> numPeers = 1
> curTimeStamp = 22789762212
> compId = 1b03894b1c5e0f7-40000-54e602b1 [27848] NOTE at 
> dmtcp_coordinator.cpp:1040 in onConnect; REASON='worker connected' 
> hello_remote.from = 1b03894b1c5e0f7-40000-54e602b1 [27848] NOTE at 
> dmtcp_coordinator.cpp:875 in onDisconnect; REASON='client disconnected' 
> client->identity() = 1b03894b1c5e0f7-40000-54e602b1
> --- end dmtcp_coordinator log ---
> 
> 
> Kind regards,
> Wijnand Suijlen
> 
> ----------------------------------------------------------------------
> -------- Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT 
> Server from Actuate! Instantly Supercharge Your Business Reports and 
> Dashboards with Interactivity, Sharing, Native Excel Exports, App 
> Integration & more Get technology previously reserved for 
> billion-dollar corporations, FREE 
> http://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg.
> clktrk _______________________________________________
> Dmtcp-forum mailing list
> Dmtcp-forum@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/dmtcp-forum
> 
> 
> ----------------------------------------------------------------------
> -------- Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT 
> Server from Actuate! Instantly Supercharge Your Business Reports and 
> Dashboards with Interactivity, Sharing, Native Excel Exports, App 
> Integration & more Get technology previously reserved for 
> billion-dollar corporations, FREE 
> http://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg.
> clktrk _______________________________________________
> Dmtcp-forum mailing list
> Dmtcp-forum@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/dmtcp-forum
> <build.log><testdir-contents>

------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Dmtcp-forum mailing list
Dmtcp-forum@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dmtcp-forum

Reply via email to