Hello, DMTCP Team,

I encountered a problem when trying to checkpoint/restart a VM on the same
node using DMTCP and some plugins in the package.

Here is some information on setup.

1. DMTCP version 2.4.5
2. ./configure --enable-infiniband-support && make && make install
3. In contrib/kvm, do make and get dmtcp_kvmhijack.so
    In contrib/tun, do make and get dmtcp_tunhijack.so
4. VM startup command:
    qemu-system-x86_64 -enable-kvm -cpu host -smp 2 -m 4096 -hda
/home/mig/vm1.qcow2 -net nic,macaddr=52-54-00-12-32-2,model=virtio -net

The steps in my test: (I went through your cluster'13 paper and slides, I
believe this is how you guys run. Please let me know if I'm wrong though.)

1. Run ./dmtcp_coordinator first;

2. In another terminal, run
./dmtcp_launch --with-plugin
qemu-system-x86_64 -enable-kvm -cpu host -smp 2 -m 4096 -hda
/home/mig/vm1.qcow2 -net nic,macaddr=52-54-00-12-32-2,model=virtio -net

3. Then execute checkpoint manually in the first terminal (by pressing c),
and I get the following output
[6818] NOTE at dmtcp_coordinator.cpp:1291 in startCheckpoint;
REASON='starting checkpoint, suspending all nodes'
     s.numPeers = 1
[6818] NOTE at dmtcp_coordinator.cpp:1293 in startCheckpoint;
REASON='Incremented computationGeneration'
     compId.computationGeneration() = 1
[6818] NOTE at dmtcp_coordinator.cpp:917 in onDisconnect; REASON='client
     client->identity() = 462a1597e64cd8e1-40000-57e4cbfc
     client->progname() = qemu-system-x86_64

4. The VM can be launched correctly. However, it failed after manual
checkpointing with the following error. And there's no any checkpoint file
got generated.
[40000] ERROR at procselfmaps.cpp:214 in getNextArea;
REASON='JASSERT(data[dataIdx++] == '\n') failed'
qemu-system-x86_64 (40000): Terminating...

Also, I got the same error when removing the network configuration of VM
(-net nic ... script=no) and tun plugin.

Can you please take a look at the problem?
Any help is really appreciated.

Dmtcp-forum mailing list

Reply via email to