[email protected] wrote:
> Hi Debian people ;-),
>
> After having some issues with Fedora last year I decided to reinstall all my
> servers to Debian 10. I'm supper happy with Debian except one repeating issue
> I have with QEMU-KVM hosts that is very difficult to reproduce so I would
> like to discuss it first before I open a new bug. Could you please discuss it
> with me? ;-)
>
> I noticed that when I run VMs for a long period of time (a couple of days)
> one or multiple VMs quite often stuck. It is not possible to connect the
> stuck VMs using virt-manager and their serial consoles don't respond.
First question: when they are just a few minutes old, does the
serial console work?
> It is not possible to shut them down ("virsh shutdown vm"). Sometimes the
> stuck VMs can be powered down ("virsh destroy vm") but in most cases "virsh
> destroy" doesn't work. In that case the only thing to do is to shut down rest
> of running VMs (that do respond) and reboot the host.
Second question: when the VMs are a few minutes old, does virsh
shutdown work?
> When I reboot/shutdown the host the reboot/shutdown takes approx. 30min.
>
> This is how it looks like during the reboot / shutdown:
> ~~~
> [ ***] (1 of 4) A stop job is running for /dev/dm-1 (18min 6s / no limit)
You probably want to change that to 1 minute or so.
> As I mentioned it is very difficult to reproduce it since it takes days to
> get into that situation. VMs that are more likely to get stuck are VMs that:
>
> a) have larger virtual disks
> b) more intensive storage use (use more IOPs)
> c) have more vCPUs
>
> The problem is that VMs with larger disks usually use more IOPs and also have
> more vCPUs so it is difficult to say what exactly is the issue. Based on my
> testing I thing that less vCPUs makes it less likely to get stuck but it's
> difficult to say...
>
> The only thing I'm confident is that the problem is not HW related - it
> happened both on my SuperMicro with XEON E5 v2 and on other hardware with
> Intel i7 7th gen.
Are the VMs set up to match the local hardware definition or be
fully emulated?
And, especially: if they are not using virtio for disk and
network address, try that ASAP.
> Btw. this has never happened on my laptop that has same configuration as the
> server (+Desktop Env.) but I reboot it multiple time a week so that might be
> an answer...
Not so much an answer as an explanation why you haven't seen it,
but, sure, that's plausible.
-dsr-