Hi Wijnand,

I have been trying to reproduce this issue on my setup. I have tried
this with DMTCP-2.3.1 and the latest DMTCP from the Github trunk
(github.com/dmtcp/dmtcp).  Most of the `make check` tests pass in
both the cases, but some tests are failing. As you said, it could be a
kernel issue; I have started looking into this. Here are the details:

$ lsb_release -a
LSB Version:    n/a
Distributor ID: openSUSE project
Description:    openSUSE 13.2 (Harlequin) (x86_64)
Release:        13.2
Codename:       Harlequin
$ uname -a
Linux linux.site 3.16.7-7-desktop #1 SMP PREEMPT Wed Dec 17 18:00:44 UTC 2014 
(762f27a) x86_64 x86_64 x86_64 GNU/Linux

In the meantime, could you please try the DMTCP trunk from Github
and check if the issue persists? Also, try running the tests using
`DMTCP_GZIP=0 make check`.  I’ve noticed some kernel lockups
when using gzip.

Thanks,
Rohan

> On Feb 23, 2015, at 5:47 AM, Wijnand SUIJLEN <wijnand.suij...@huawei.com> 
> wrote:
> 
> Hi Rohan,
> 
> Thanks for looking into this. I did what you wrote: Downloaded the source 
> tarball and did a "./configure && make && make check-dmtcp1". Indeed, the 
> test fails. For completeness sake, I attached to this e-mail (build.log) a 
> log of a complete "./configure && make && make check". All tests are failing. 
> 
> The other attachment (testdir-contents) is a directory listing of the 
> location where the test scripts copies the checkpoints to. To prove that the 
> checkpoints files are not all-zeros, below are the first 10 lines of a 
> hexdump of the dmtcp1 checkpoint
> 
> wijnand@linux-6ea7:~/bla> hexdump -C 
> /tmp/dmtcp-wijn...@linux-6ea7.site/dmtcp-autotest-845315162/ckpt_dmtcp1_1b03894b1c5e0f7-40000-54e6847e.dmtcp
>  | head -n 10
> 00000000  1f 8b 08 00 7e 84 e6 54  04 03 ec dd 7d 6c 5d 67  |....~..T....}l]g|
> 00000010  7d 07 f0 63 a7 49 4c db  39 e6 65 9d 29 2f 75 a1  |}..c.IL.9.e.)/u.|
> 00000020  65 06 c9 76 ec bc 60 82  18 76 9b b4 37 5b 80 ac  |e..v..`..v..7[..|
> 00000030  24 23 ac 4b 6d 27 76 70  da c4 31 f5 0d 35 94 d1  |$#.Km'vp..1..5..|
> 00000040  8c b4 2c 16 2b 64 13 db  bc a9 62 a1 2a 34 9b d0  |..,.+d....b.*4..|
> 00000050  94 49 fb 23 e2 0f 9a ae  94 04 6d 62 65 05 2d 7b  |.I.#......mbe.-{|
> 00000060  11 ca 50 db 39 65 48 45  bc 79 08 c8 ce 73 ce f3  |..P.9eHE.y...s..|
> 00000070  a4 f6 95 9d bb 21 9a 32  fa 39 d5 bd df f3 7b 9e  |.....!.2.9....{.|
> 00000080  e7 bc 7d ee 71 ee cb b9  76 d7 bf 6d cb f5 9b 07  |..}.q...v..m....|
> 00000090  ae af 6c b8 fe 37 36 bf  63 e3 db b7 0c 6c 7c 5b  |..l..76.c....l|[|
> 
> 
> I can imagine that the error occurs because of a recent change in the Linux 
> kernel. Therefore, here is the output of uname on my system:
> wijnand@linux-6ea7:~/bla> uname -a
> Linux linux-6ea7.site 3.16.7-7-desktop #1 SMP PREEMPT Wed Dec 17 18:00:44 UTC 
> 2014 (762f27a) x86_64 x86_64 x86_64 GNU/Linux
> 
> Kind regards,
> Wijnand Suijlen
> 
> 
> -----Original Message-----
> From: Rohan Garg [mailto:rohg...@ccs.neu.edu] 
> Sent: Friday, February 20, 2015 5:57 PM
> To: Wijnand SUIJLEN
> Cc: dmtcp-forum
> Subject: Re: [Dmtcp-forum] dmtcp_restart (dmtcp 2.3.1) won't restart from 
> checkpoint on OpenSUSE 13.2
> 
> Hi Wijnand,
> 
> Could you please try the following steps?
> 
> 1) Download the source tarball from: 
> http://sourceforge.net/projects/dmtcp/files/dmtcp-2.x/2.3.1/
> 2) ./configure && make && make check-dmtcp1
> 3) Verify that the test passes
> 
> Also, could you please verify that the checkpoint image is of non-zero size? 
> It could be that dmtcp_launch is failing to create a valid checkpoint image.
> 
> Thanks,
> Rohan
> 
> ----- Original Message -----
> From: "Kapil Arya" <kapil.arya...@gmail.com>
> To: "Wijnand SUIJLEN" <wijnand.suij...@huawei.com>
> Cc: "dmtcp-forum" <dmtcp-forum@lists.sourceforge.net>
> Sent: Friday, February 20, 2015 11:04:04 AM
> Subject: Re: [Dmtcp-forum] dmtcp_restart (dmtcp 2.3.1) won't restart from 
> checkpoint on OpenSUSE 13.2
> 
> Rohan/Jiajun, 
> 
> Can you take a look at it? 
> 
> Best,
> Kapil 
> 
> On Thu, Feb 19, 2015 at 11:40 AM, Wijnand SUIJLEN < 
> wijnand.suij...@huawei.com > wrote: 
> 
> 
> Hi, 
> 
> I am running OpenSUSE 13.2 (64-bit) inside VirtualBox and I am trying to get 
> DMTCP 2.3.1 to work on the simple example 'dmtcp1.c' as supplied in the 
> source tar.gz distribution of the package (dmtcp-2.3.1/test/dmtcp1.c). There 
> are no complaints from DMTCP when writing the checkpoint. However, the 
> dmtcp_restart fails to restart the program with the error message "only read 
> 0 bytes instead of 4096 from checkpoint file". 
> 
> What am I doing wrong and what should I do to make it work? 
> 
> Some details: 
> I have tried it with an installation as built directly from the source 
> distribution and I tried it with the binary distribution as supplied by 
> OpenSUSE 13.2: It doesn't make any difference. 
> http://download.opensuse.org/distribution/13.2/repo/oss/suse/x86_64/dmtcp-2.3.1-2.2.2.x86_64.rpm
> http://download.opensuse.org/distribution/13.2/repo/oss/suse/x86_64/dmtcp-devel-2.3.1-2.2.2.x86_64.rpm
>  
> 
> --- start console log ---
> wijnand@linux-6ea7:~/bla/dmtcp-2.3.1/test> gcc -fPIC -o dmtcp1 dmtcp1.c 
> wijnand@linux-6ea7:~/bla/dmtcp-2.3.1/test> dmtcp_checkpoint ./dmtcp1
> 1 2 3 4 5 6 7 8 9 ^C
> wijnand@linux-6ea7:~/bla/dmtcp-2.3.1/test> dmtcp_restart 
> ckpt_dmtcp1_1b03894b1c5e0f7-40000-54e602b1.dmtcp
> [27865] mtcp_util.ic:235 mtcp_readfile: 
> only read 0 bytes instead of 4096 from checkpoint file [27865] 
> mtcp_util.ic:235 mtcp_readfile: 
> only read 0 bytes instead of 4096 from checkpoint file [27865] 
> mtcp_util.ic:235 mtcp_readfile: 
> only read 0 bytes instead of 4096 from checkpoint file [27865] 
> mtcp_util.ic:235 mtcp_readfile: 
> only read 0 bytes instead of 4096 from checkpoint file [27865] 
> mtcp_util.ic:235 mtcp_readfile: 
> only read 0 bytes instead of 4096 from checkpoint file [27865] 
> mtcp_util.ic:235 mtcp_readfile: 
> only read 0 bytes instead of 4096 from checkpoint file [27865] 
> mtcp_util.ic:235 mtcp_readfile: 
> only read 0 bytes instead of 4096 from checkpoint file [27865] 
> mtcp_util.ic:235 mtcp_readfile: 
> only read 0 bytes instead of 4096 from checkpoint file [27865] 
> mtcp_util.ic:235 mtcp_readfile: 
> only read 0 bytes instead of 4096 from checkpoint file [27865] 
> mtcp_util.ic:235 mtcp_readfile: 
> only read 0 bytes instead of 4096 from checkpoint file [27865] 
> mtcp_util.ic:235 mtcp_readfile: 
> only read 0 bytes instead of 4096 from checkpoint file [27865] 
> mtcp_util.ic:237 mtcp_readfile: 
> failed to read after 10 tries in a row. 
> Segmentation fault
> wijnand@linux-6ea7:~/bla/dmtcp-2.3.1/test>
> --- end console log --- 
> 
> --- start dmtcp_coordinator log ---
> wijnand@linux-6ea7:~/bla/dmtcp-2.3.1/test> dmtcp_coordinator 
> dmtcp_coordinator (DMTCP) 2.3.1 License LGPLv3+: GNU LGPL version 3 or later 
> < http://gnu.org/licenses/lgpl.html >. 
> This program comes with ABSOLUTELY NO WARRANTY. 
> This is free software, and you are welcome to redistribute it under certain 
> conditions; see COPYING file for details. 
> (Use flag "-q" to hide this message.) 
> 
> dmtcp_coordinator starting... 
> Host: linux-6ea7.site (0.0.0.0)
> Port: 7779
> Checkpoint Interval: disabled (checkpoint manually instead) Exit on last 
> client: 0 Type '?' for help. 
> 
> [27848] NOTE at dmtcp_coordinator.cpp:1040 in onConnect; REASON='worker 
> connected' 
> hello_remote.from = 1b03894b1c5e0f7-27851-54e602b1 [27848] NOTE at 
> dmtcp_coordinator.cpp:825 in onData; REASON='Updating process Information 
> after exec()' 
> progname = dmtcp1
> msg.from = 1b03894b1c5e0f7-40000-54e602b1 
> client->identity() = 1b03894b1c5e0f7-27851-54e602b1
> c
> [27848] NOTE at dmtcp_coordinator.cpp:1271 in startCheckpoint; 
> REASON='starting checkpoint, suspending all nodes' 
> s.numPeers = 1
> [27848] NOTE at dmtcp_coordinator.cpp:1273 in startCheckpoint; 
> REASON='Incremented Generation' 
> compId.generation() = 1
> [27848] NOTE at dmtcp_coordinator.cpp:615 in updateMinimumState; 
> REASON='locking all nodes' 
> [27848] NOTE at dmtcp_coordinator.cpp:621 in updateMinimumState; 
> REASON='draining all nodes' 
> [27848] NOTE at dmtcp_coordinator.cpp:627 in updateMinimumState; 
> REASON='checkpointing all nodes' 
> [27848] NOTE at dmtcp_coordinator.cpp:641 in updateMinimumState; 
> REASON='building name service database' 
> [27848] NOTE at dmtcp_coordinator.cpp:657 in updateMinimumState; 
> REASON='entertaining queries now' 
> [27848] NOTE at dmtcp_coordinator.cpp:662 in updateMinimumState; 
> REASON='refilling all nodes' 
> [27848] NOTE at dmtcp_coordinator.cpp:693 in updateMinimumState; 
> REASON='restarting all nodes' 
> [27848] NOTE at dmtcp_coordinator.cpp:875 in onDisconnect; REASON='client 
> disconnected' 
> client->identity() = 1b03894b1c5e0f7-40000-54e602b1
> [27848] NOTE at dmtcp_coordinator.cpp:1096 in 
> validateRestartingWorkerProcess; REASON='FIRST dmtcp_restart connection. Set 
> numPeers. Generate timestamp' 
> numPeers = 1
> curTimeStamp = 22789762212
> compId = 1b03894b1c5e0f7-40000-54e602b1 [27848] NOTE at 
> dmtcp_coordinator.cpp:1040 in onConnect; REASON='worker connected' 
> hello_remote.from = 1b03894b1c5e0f7-40000-54e602b1 [27848] NOTE at 
> dmtcp_coordinator.cpp:875 in onDisconnect; REASON='client disconnected' 
> client->identity() = 1b03894b1c5e0f7-40000-54e602b1
> --- end dmtcp_coordinator log --- 
> 
> 
> Kind regards, 
> Wijnand Suijlen 
> 
> ------------------------------------------------------------------------------
>  
> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server 
> from Actuate! Instantly Supercharge Your Business Reports and Dashboards 
> with Interactivity, Sharing, Native Excel Exports, App Integration & more 
> Get technology previously reserved for billion-dollar corporations, FREE 
> http://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg.clktrk 
> _______________________________________________ 
> Dmtcp-forum mailing list 
> Dmtcp-forum@lists.sourceforge.net 
> https://lists.sourceforge.net/lists/listinfo/dmtcp-forum 
> 
> 
> ------------------------------------------------------------------------------
> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
> from Actuate! Instantly Supercharge Your Business Reports and Dashboards
> with Interactivity, Sharing, Native Excel Exports, App Integration & more
> Get technology previously reserved for billion-dollar corporations, FREE
> http://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg.clktrk
> _______________________________________________
> Dmtcp-forum mailing list
> Dmtcp-forum@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/dmtcp-forum
> <build.log><testdir-contents>


------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg.clktrk
_______________________________________________
Dmtcp-forum mailing list
Dmtcp-forum@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dmtcp-forum

Reply via email to