Bug#719675: [Pkg-libvirt-maintainers] Bug#719675: Live migration of KVM guests fails if it takes more than 30 seconds (large memory guests)

Christian Balzer Wed, 14 Aug 2013 17:48:45 -0700

On Wed, 14 Aug 2013 21:50:22 +0200 Guido Günther wrote:

> On Wed, Aug 14, 2013 at 04:49:42PM +0900, Christian Balzer wrote:
> > 
> > Package: libvirt0
> > Version: 0.9.12-11+deb7u1
> > Severity: important
> > 
> > Hello,
> > 
> > when doing a live migration using Pacemaker (the OCF VirtualDomain RA)
> > on a cluster with DRBD (active/active) backing storage everything
> > works fine with recently started (small memory footprint of about
> > 200MB at most) KVM guests. 
> > 
> > After inflating one guest to 2GB memory usage (memtester comes in handy
> > for that) the migration failed after 30 seconds, having managed to
> > migrate about 400MB in that time over the direct, dedicated GbE link
> > between my test cluster host nodes. 
> > 
> > libvirtd.log on the migration target node, migration start time is
> > 07:24:51 :
> > ---
> > 2013-08-13 07:24:51.807+0000: 31953: warning :
> > qemuDomainObjEnterMonitorInternal :994 : This thread seems to be the
> > async job owner; entering monitor without ask ing for a nested job is
> > dangerous 2013-08-13 07:24:51.886+0000: 31953: warning :
> > qemuDomainObjEnterMonitorInternal :994 : This thread seems to be the
> > async job owner; entering monitor without ask ing for a nested job is
> > dangerous 2013-08-13 07:24:51.888+0000: 31953: warning :
> > qemuDomainObjEnterMonitorInternal :994 : This thread seems to be the
> > async job owner; entering monitor without ask ing for a nested job is
> > dangerous 2013-08-13 07:24:51.948+0000: 31953: warning :
> > qemuDomainObjEnterMonitorInternal :994 : This thread seems to be the
> > async job owner; entering monitor without ask ing for a nested job is
> > dangerous 2013-08-13 07:24:51.948+0000: 31953: warning :
> > qemuDomainObjEnterMonitorInternal :994 : This thread seems to be the
> > async job owner; entering monitor without ask ing for a nested job is
> > dangerous 2013-08-13 07:25:21.217+0000: 31950: warning :
> > virKeepAliveTimer:182 : No response from client 0x1948280 after 5
> > keepalive messages in 30 seconds 2013-08-13 07:25:31.224+0000: 31950:
> > warning : qemuProcessKill:3813 : Timed out waiting after SIGTERM to
> > process 15926, sending SIGKILL
> 
> This looks more like you're not replying via the keepalive protocol.
> What are you using to migrate VMs?
>  -- Guido
> 
As I said up there, the Pacemaker (heartbeat, OCF really) resource agent,
with SSH as transport (and only) option. 
So the resulting migration URI should be something like:


qemu+ssh://targethost/system

Of course with properly distributed authorized_keys, again this works just
fine with a small enough guest.

If there wasn't a proper two-way communication going on, shouldn't the
migration fail from the start?

[snip]

Regards,

Christian
-- 
Christian Balzer        Network/Systems Engineer                
ch...@gol.com           Global OnLine Japan/Fusion Communications
http://www.gol.com/


--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org

Bug#719675: [Pkg-libvirt-maintainers] Bug#719675: Live migration of KVM guests fails if it takes more than 30 seconds (large memory guests)

Reply via email to