Package: qemu-guest-agent
Version: 1:4.2-3
Severity: important

[severity important since this might cause data loss in the VM]

Hi,

I see this bug on a single VM running Debian unstable on a host that is
also running Debian unstable. The host is frequently sent to
suspend-to-ram, with measures in place to properly virsh suspend the VMs
and to virsh resume them after waking up the host again. After wakeup,
qemu-agent-commmand is invoked with guest-exec to use hwclock -s to
re-initialize the VMs clock.

After such a suspend-resume cycle, when shutting down the host, or
issueing an explicit virsh shutdown, the VM doesn't shut down cleanly.
syslog writes:

|Feb 21 19:01:15 spinturn qemu-ga: info: guest-shutdown called, mode: powerdown
|Feb 21 19:01:15 spinturn qemu-ga[914]: **
|Feb 21 19:01:15 spinturn qemu-ga[914]: 
ERROR:/build/qemu-KAaD7C/qemu-4.2/qga/main.c:532:send_response: assertion 
failed: (rsp && s->channel)
|Feb 21 19:01:15 spinturn qemu-ga[914]: Bail out! 
ERROR:/build/qemu-KAaD7C/qemu-4.2/qga/main.c:532:send_response: assertion 
failed: (rsp && s->channel)
|Feb 21 19:01:15 spinturn systemd[1]: qemu-guest-agent.service: Succeeded.
|Feb 21 19:01:15 spinturn systemd[1]: qemu-guest-agent.service: Stop job 
pending for unit, delaying automatic restart.
|Feb 21 19:01:15 spinturn systemd[1]: qemu-guest-agent.service: Stop job 
pending for unit, delaying automatic restart.
|Feb 21 19:01:15 spinturn systemd[1]: qemu-guest-agent.service: Stop job 
pending for unit, delaying automatic restart.
|Feb 21 19:01:15 spinturn systemd[1]: qemu-guest-agent.service: Stop job 
pending for unit, delaying automatic restart.
|Feb 21 19:01:15 spinturn systemd[1]: qemu-guest-agent.service: Stop job 
pending for unit, delaying automatic restart.

The "stop job pending" lines are written in an endless loop, at a rate
of about 6500 (sic!) lines per second, interrupted with spurious
"Looping too fast. Throttling execution a little." lines. In this state,
it is too late to log in again, the only way to recover is virsh destroy
(which might lead to data loss in the VM, hence the severity
'important').

This also delays shutdown/restart of the host since the host won't shut
down while the VM is still running, waiting for an eventual destroy of
the VM done by the libvirt service units.

If one was smart enough to have an extra shell open on the VM, one can
try manual systemctl stop, systemctl start (all without visible effect).
The only way to get the shutdown to continue gracefully is systemctl
mask (which of course needs a corresponding unmask after the machine has
been brought up again). As far as I see, this only happens when the
shutdown is triggered externally, a shutdown -h now on the VM itself has
not yet failed on me.

Since we're actually failing an assertion here, this might be easy to
debug, but it's beyond my knowledge.

In the failed state, a restart rate of multiple thousand tries per
second is way too fast. Please enrich the systemd unit with appropriate
delays to prevent the log flood.

Greetings
Marc

-- System Information:
Debian Release: bullseye/sid
  APT prefers unstable
  APT policy: (500, 'unstable')
Architecture: amd64 (x86_64)

Kernel: Linux 5.5.2-zgsrv20080 (SMP w/6 CPU cores; PREEMPT)
Locale: LANG=de_DE.utf8, LC_CTYPE=de_DE.utf8 (charmap=UTF-8), LANGUAGE=en 
(charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)

Versions of packages qemu-guest-agent depends on:
ii  init-system-helpers  1.57
ii  libc6                2.29-6
ii  libgcc-s1 [libgcc1]  10-20200204-1
ii  libgcc1              1:10-20200204-1
ii  libglib2.0-0         2.62.4-2
ii  libudev1             244.2-1
ii  lsb-base             11.1.0

qemu-guest-agent recommends no packages.

qemu-guest-agent suggests no packages.

-- no debconf information

Reply via email to