Package: qemu-system-x86
Version: 2.1+dfsg-12~bpo70+1
Severity: important

Dear Maintainer,

We are seeing guests lockup/hang with qemu. The guests hang with 100% CPU
usage. The problem seems to be storage/IO related, but there is not necessarily
high IO happening on the host at the time the guest hangs.

At the time of the crash, the VNC console is not responsive and the only way to
resolve is to forcefully power off the guest and back on.

This guest shown below is running Debian Wheezy, but it seems to affect other
guests operating systems such as Windows, CentOS etc.

The hosts storage where guest disk images are stored is OCFS2 formatted running
over iSCSI.

Guests that have a disk cache of cache='writeback' and the qcow2 disk image
file created with qemu-img -o preallocation=metadata seem to be less frequently
affected, but none the less are still affected. We have also tried virtio-blk,
virtio-scsi, scsi and standard IDE for the disk controller on the guest, but
doesn't seem to improve things.

qemu-system-x86_64 -enable-kvm -name guest1 -S -machine pc-
i440fx-2.1,accel=kvm,usb=off -m 1024 -realtime mlock=off -smp
1,sockets=1,cores=1,threads=1 -uuid 973bf27b-04f9-61dd-9272-de2467b599d5 -no-
user-config -nodefaults -chardev
socket,id=charmonitor,path=/var/lib/libvirt/qemu/guest1.monitor,server,nowait
-mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown
-boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device
lsi,id=scsi0,bus=pci.0,addr=0x4 -drive file=/mnt/vm/guest1.img,if=none,id
=drive-scsi0-0-0,format=qcow2,cache=none -device scsi-hd,bus=scsi0.0,scsi-
id=0,drive=drive-scsi0-0-0,id=scsi0-0-0,bootindex=1 -netdev
tap,fd=31,id=hostnet0,vhost=on,vhostfd=42 -device virtio-net-
pci,netdev=hostnet0,id=net0,mac=52:54:00:9a:36:10,bus=pci.0,addr=0x3 -chardev
pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -device
usb-tablet,id=input0 -vnc 127.0.0.1:9 -device cirrus-
vga,id=video0,bus=pci.0,addr=0x2 -device virtio-balloon-
pci,id=balloon0,bus=pci.0,addr=0x5 -msg timestamp=on


A backtrace of a hung guest shows:

(gdb) bt
#0  0x00007f6ce44fed5c in __lll_lock_wait () from /lib/x86_64-linux-
gnu/libpthread.so.0
#1  0x00007f6ce44fa3a9 in _L_lock_926 () from /lib/x86_64-linux-
gnu/libpthread.so.0
#2  0x00007f6ce44fa1cb in pthread_mutex_lock () from /lib/x86_64-linux-
gnu/libpthread.so.0
#3  0x00007f6cea9849f9 in ?? ()
#4  0x00007f6cea9313bb in ?? ()
#5  0x00007f6cea66ebed in main ()
(gdb) info threads
  Id   Target Id         Frame
  3    Thread 0x7f6cdac4a700 (LWP 29599) "qemu-system-x86" 0x00007f6ce4236de1
in ppoll () from /lib/x86_64-linux-gnu/libc.so.6
  2    Thread 0x7f6c993ff700 (LWP 29601) "qemu-system-x86" 0x00007f6ce44fc344
in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0
* 1    Thread 0x7f6cea49b900 (LWP 29596) "qemu-system-x86" 0x00007f6ce44fed5c
in __lll_lock_wait () from /lib/x86_64-linux-gnu/libpthread.so.0


(gdb) thread apply all bt

Thread 3 (Thread 0x7f6cdac4a700 (LWP 29599)):
#0  0x00007f6ce4236de1 in ppoll () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007f6cea931f1b in ?? ()
#2  0x00007f6cea933230 in ?? ()
#3  0x00007f6cea924fdd in ?? ()
#4  0x00007f6cea8925b6 in ?? ()
#5  0x00007f6cea899676 in ?? ()
#6  0x00007f6cea8998c5 in ?? ()
#7  0x00007f6cea891bb7 in ?? ()
#8  0x00007f6cea824929 in ?? ()
#9  0x00007f6cea824078 in ?? ()
#10 0x00007f6cea823f98 in ?? ()
#11 0x00007f6cea6b2c79 in ?? ()
#12 0x00007f6cea6b89bf in ?? ()
#13 0x00007f6cea679163 in ?? ()
#14 0x00007f6cea6b1cf5 in ?? ()
#15 0x00007f6cea69d25c in ?? ()
#16 0x00007f6ce44f7b50 in start_thread () from /lib/x86_64-linux-
gnu/libpthread.so.0
#17 0x00007f6ce424195d in clone () from /lib/x86_64-linux-gnu/libc.so.6
#18 0x0000000000000000 in ?? ()

Thread 2 (Thread 0x7f6c993ff700 (LWP 29601)):
#0  0x00007f6ce44fc344 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/x86_64
-linux-gnu/libpthread.so.0
#1  0x00007f6cea984c19 in ?? ()
#2  0x00007f6cea920b7b in ?? ()
#3  0x00007f6cea920f50 in ?? ()
#4  0x00007f6ce44f7b50 in start_thread () from /lib/x86_64-linux-
gnu/libpthread.so.0
#5  0x00007f6ce424195d in clone () from /lib/x86_64-linux-gnu/libc.so.6
#6  0x0000000000000000 in ?? ()

Thread 1 (Thread 0x7f6cea49b900 (LWP 29596)):
#0  0x00007f6ce44fed5c in __lll_lock_wait () from /lib/x86_64-linux-
gnu/libpthread.so.0
#1  0x00007f6ce44fa3a9 in _L_lock_926 () from /lib/x86_64-linux-
gnu/libpthread.so.0
#2  0x00007f6ce44fa1cb in pthread_mutex_lock () from /lib/x86_64-linux-
gnu/libpthread.so.0
#3  0x00007f6cea9849f9 in ?? ()
#4  0x00007f6cea9313bb in ?? ()
#5  0x00007f6cea66ebed in main ()

Host system information:

Linux 3.14-0.bpo.2-amd64 #1 SMP Debian 3.14.15-2~bpo70+1 (2014-08-21) x86_64
GNU/Linux

qemu-kvm                           1:2.1+dfsg-12~bpo70+1                  amd64
qemu-system-x86                    1:2.1+dfsg-12~bpo70+1                  amd64


Though not quite the same situation, google yeilds similar problems:

https://lists.gnu.org/archive/html/qemu-devel/2014-08/msg01545.html
https://lists.nongnu.org/archive/html/qemu-devel/2010-05/msg01098.html

We are not using multi-path on our storage as mentioned in the above hosts here
though.



-- System Information:
Debian Release: 7.8
  APT prefers oldstable-updates
  APT policy: (500, 'oldstable-updates'), (500, 'oldstable')
Architecture: amd64 (x86_64)
Foreign Architectures: i686
i386

Kernel: Linux 3.12-0.bpo.1-amd64 (SMP w/4 CPU cores)
Locale: LANG=en_GB.UTF-8, LC_CTYPE=en_GB.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash

Reply via email to