On Wed, 14 Aug 2013 21:50:22 +0200 Guido Günther wrote: > On Wed, Aug 14, 2013 at 04:49:42PM +0900, Christian Balzer wrote: > > > > Package: libvirt0 > > Version: 0.9.12-11+deb7u1 > > Severity: important > > > > Hello, > > > > when doing a live migration using Pacemaker (the OCF VirtualDomain RA) > > on a cluster with DRBD (active/active) backing storage everything > > works fine with recently started (small memory footprint of about > > 200MB at most) KVM guests. > > > > After inflating one guest to 2GB memory usage (memtester comes in handy > > for that) the migration failed after 30 seconds, having managed to > > migrate about 400MB in that time over the direct, dedicated GbE link > > between my test cluster host nodes. > > > > libvirtd.log on the migration target node, migration start time is > > 07:24:51 : > > --- > > 2013-08-13 07:24:51.807+0000: 31953: warning : > > qemuDomainObjEnterMonitorInternal :994 : This thread seems to be the > > async job owner; entering monitor without ask ing for a nested job is > > dangerous 2013-08-13 07:24:51.886+0000: 31953: warning : > > qemuDomainObjEnterMonitorInternal :994 : This thread seems to be the > > async job owner; entering monitor without ask ing for a nested job is > > dangerous 2013-08-13 07:24:51.888+0000: 31953: warning : > > qemuDomainObjEnterMonitorInternal :994 : This thread seems to be the > > async job owner; entering monitor without ask ing for a nested job is > > dangerous 2013-08-13 07:24:51.948+0000: 31953: warning : > > qemuDomainObjEnterMonitorInternal :994 : This thread seems to be the > > async job owner; entering monitor without ask ing for a nested job is > > dangerous 2013-08-13 07:24:51.948+0000: 31953: warning : > > qemuDomainObjEnterMonitorInternal :994 : This thread seems to be the > > async job owner; entering monitor without ask ing for a nested job is > > dangerous 2013-08-13 07:25:21.217+0000: 31950: warning : > > virKeepAliveTimer:182 : No response from client 0x1948280 after 5 > > keepalive messages in 30 seconds 2013-08-13 07:25:31.224+0000: 31950: > > warning : qemuProcessKill:3813 : Timed out waiting after SIGTERM to > > process 15926, sending SIGKILL > > This looks more like you're not replying via the keepalive protocol. > What are you using to migrate VMs? > -- Guido > As I said up there, the Pacemaker (heartbeat, OCF really) resource agent, with SSH as transport (and only) option. So the resulting migration URI should be something like:
qemu+ssh://targethost/system Of course with properly distributed authorized_keys, again this works just fine with a small enough guest. If there wasn't a proper two-way communication going on, shouldn't the migration fail from the start? [snip] Regards, Christian -- Christian Balzer Network/Systems Engineer ch...@gol.com Global OnLine Japan/Fusion Communications http://www.gol.com/ -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org