Package: qemu-guest-agent Version: 1:4.2-3 Severity: important [severity important since this might cause data loss in the VM]
Hi, I see this bug on a single VM running Debian unstable on a host that is also running Debian unstable. The host is frequently sent to suspend-to-ram, with measures in place to properly virsh suspend the VMs and to virsh resume them after waking up the host again. After wakeup, qemu-agent-commmand is invoked with guest-exec to use hwclock -s to re-initialize the VMs clock. After such a suspend-resume cycle, when shutting down the host, or issueing an explicit virsh shutdown, the VM doesn't shut down cleanly. syslog writes: |Feb 21 19:01:15 spinturn qemu-ga: info: guest-shutdown called, mode: powerdown |Feb 21 19:01:15 spinturn qemu-ga[914]: ** |Feb 21 19:01:15 spinturn qemu-ga[914]: ERROR:/build/qemu-KAaD7C/qemu-4.2/qga/main.c:532:send_response: assertion failed: (rsp && s->channel) |Feb 21 19:01:15 spinturn qemu-ga[914]: Bail out! ERROR:/build/qemu-KAaD7C/qemu-4.2/qga/main.c:532:send_response: assertion failed: (rsp && s->channel) |Feb 21 19:01:15 spinturn systemd[1]: qemu-guest-agent.service: Succeeded. |Feb 21 19:01:15 spinturn systemd[1]: qemu-guest-agent.service: Stop job pending for unit, delaying automatic restart. |Feb 21 19:01:15 spinturn systemd[1]: qemu-guest-agent.service: Stop job pending for unit, delaying automatic restart. |Feb 21 19:01:15 spinturn systemd[1]: qemu-guest-agent.service: Stop job pending for unit, delaying automatic restart. |Feb 21 19:01:15 spinturn systemd[1]: qemu-guest-agent.service: Stop job pending for unit, delaying automatic restart. |Feb 21 19:01:15 spinturn systemd[1]: qemu-guest-agent.service: Stop job pending for unit, delaying automatic restart. The "stop job pending" lines are written in an endless loop, at a rate of about 6500 (sic!) lines per second, interrupted with spurious "Looping too fast. Throttling execution a little." lines. In this state, it is too late to log in again, the only way to recover is virsh destroy (which might lead to data loss in the VM, hence the severity 'important'). This also delays shutdown/restart of the host since the host won't shut down while the VM is still running, waiting for an eventual destroy of the VM done by the libvirt service units. If one was smart enough to have an extra shell open on the VM, one can try manual systemctl stop, systemctl start (all without visible effect). The only way to get the shutdown to continue gracefully is systemctl mask (which of course needs a corresponding unmask after the machine has been brought up again). As far as I see, this only happens when the shutdown is triggered externally, a shutdown -h now on the VM itself has not yet failed on me. Since we're actually failing an assertion here, this might be easy to debug, but it's beyond my knowledge. In the failed state, a restart rate of multiple thousand tries per second is way too fast. Please enrich the systemd unit with appropriate delays to prevent the log flood. Greetings Marc -- System Information: Debian Release: bullseye/sid APT prefers unstable APT policy: (500, 'unstable') Architecture: amd64 (x86_64) Kernel: Linux 5.5.2-zgsrv20080 (SMP w/6 CPU cores; PREEMPT) Locale: LANG=de_DE.utf8, LC_CTYPE=de_DE.utf8 (charmap=UTF-8), LANGUAGE=en (charmap=UTF-8) Shell: /bin/sh linked to /bin/dash Init: systemd (via /run/systemd/system) Versions of packages qemu-guest-agent depends on: ii init-system-helpers 1.57 ii libc6 2.29-6 ii libgcc-s1 [libgcc1] 10-20200204-1 ii libgcc1 1:10-20200204-1 ii libglib2.0-0 2.62.4-2 ii libudev1 244.2-1 ii lsb-base 11.1.0 qemu-guest-agent recommends no packages. qemu-guest-agent suggests no packages. -- no debconf information