On 2011-02-26 12:43, xming wrote:
> When trying to start X (and it loads qxl driver) the kvm process just crashes.
> 
> qemu-kvm 0.14
> 
> startup line
> 
> /usr/bin/kvm -name spaceball,process=spaceball -m 1024 -kernel
> /boot/bzImage-2.6.37.2-guest -append "root=/dev/vda ro" -smp 1 -netdev
> type=tap,id=spaceball0,script=kvm-ifup-brloc,vhost=on -device
> virtio-net-pci,netdev=spaceball0,mac=00:16:3e:00:08:01 -drive
> file=/dev/volume01/G-spaceball,if=virtio -vga qxl -spice
> port=5957,disable-ticketing -monitor
> telnet:192.168.0.254:10007,server,nowait,nodelay -pidfile
> /var/run/kvm/spaceball.pid
> 
> host is running vanilla 2.6.37.1 on amd64.
> 
> Here is the bt
> 
> # gdb /usr/bin/qemu-system-x86_64
> GNU gdb (Gentoo 7.2 p1) 7.2
> Copyright (C) 2010 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
> and "show warranty" for details.
> This GDB was configured as "x86_64-pc-linux-gnu".
> For bug reporting instructions, please see:
> <http://bugs.gentoo.org/>...
> Reading symbols from /usr/bin/qemu-system-x86_64...done.
> (gdb) set args -name spaceball,process=spaceball -m 1024 -kernel
> /boot/bzImage-2.6.37.2-guest -append "root=/dev/vda ro" -smp 1 -netdev
> type=tap,id=spaceball0,script=kvm-ifup-brloc,vhost=on -device
> virtio-net-pci,netdev=spaceball0,mac=00:16:3e:00:08:01 -drive
> file=/dev/volume01/G-spaceball,if=virtio -vga qxl -spice
> port=5957,disable-ticketing -monitor
> telnet:192.168.0.254:10007,server,nowait,nodelay -pidfile
> /var/run/kvm/spaceball.pid
> (gdb) run
> Starting program: /usr/bin/qemu-system-x86_64 -name
> spaceball,process=spaceball -m 1024 -kernel
> /boot/bzImage-2.6.37.2-guest -append "root=/dev/vda ro" -smp 1 -netdev
> type=tap,id=spaceball0,script=kvm-ifup-brloc,vhost=on -device
> virtio-net-pci,netdev=spaceball0,mac=00:16:3e:00:08:01 -drive
> file=/dev/volume01/G-spaceball,if=virtio -vga qxl -spice
> port=5957,disable-ticketing -monitor
> telnet:192.168.0.254:10007,server,nowait,nodelay -pidfile
> /var/run/kvm/spaceball.pid
> [Thread debugging using libthread_db enabled]
> do_spice_init: starting 0.6.0
> spice_server_add_interface: SPICE_INTERFACE_KEYBOARD
> spice_server_add_interface: SPICE_INTERFACE_MOUSE
> [New Thread 0x7ffff4802710 (LWP 30294)]
> spice_server_add_interface: SPICE_INTERFACE_QXL
> [New Thread 0x7fffaacae710 (LWP 30295)]
> red_worker_main: begin
> handle_dev_destroy_surfaces:
> handle_dev_destroy_surfaces:
> handle_dev_input: start
> [New Thread 0x7fffaa4ad710 (LWP 30298)]
> [New Thread 0x7fffa9cac710 (LWP 30299)]
> [New Thread 0x7fffa94ab710 (LWP 30300)]
> [New Thread 0x7fffa8caa710 (LWP 30301)]
> [New Thread 0x7fffa3fff710 (LWP 30302)]
> [New Thread 0x7fffa37fe710 (LWP 30303)]
> [New Thread 0x7fffa2ffd710 (LWP 30304)]
> [New Thread 0x7fffa27fc710 (LWP 30305)]
> [New Thread 0x7fffa1ffb710 (LWP 30306)]
> [New Thread 0x7fffa17fa710 (LWP 30307)]
> reds_handle_main_link:
> reds_show_new_channel: channel 1:0, connected successfully, over Non Secure 
> link
> reds_main_handle_message: net test: latency 5.636000 ms, bitrate
> 11027768 bps (10.516899 Mbps)
> reds_show_new_channel: channel 2:0, connected successfully, over Non Secure 
> link
> red_dispatcher_set_peer:
> handle_dev_input: connect
> handle_new_display_channel: jpeg disabled
> handle_new_display_channel: zlib-over-glz disabled
> reds_show_new_channel: channel 4:0, connected successfully, over Non Secure 
> link
> red_dispatcher_set_cursor_peer:
> handle_dev_input: cursor connect
> reds_show_new_channel: channel 3:0, connected successfully, over Non Secure 
> link
> inputs_link:
> [New Thread 0x7fffa07f8710 (LWP 30312)]
> [New Thread 0x7fff9fff7710 (LWP 30313)]
> [New Thread 0x7fff9f7f6710 (LWP 30314)]
> [New Thread 0x7fff9eff5710 (LWP 30315)]
> [New Thread 0x7fff9e7f4710 (LWP 30316)]
> [New Thread 0x7fff9dff3710 (LWP 30317)]
> [New Thread 0x7fff9d7f2710 (LWP 30318)]
> qemu-system-x86_64:
> /var/tmp/portage/app-emulation/qemu-kvm-0.14.0/work/qemu-kvm-0.14.0/qemu-kvm.c:1724:
> kvm_mutex_unlock: Assertion `!cpu_single_env' failed.
> 
> Program received signal SIGABRT, Aborted.
> [Switching to Thread 0x7ffff4802710 (LWP 30294)]
> 0x00007ffff5daa165 in raise () from /lib/libc.so.6
> (gdb)
> (gdb)
> (gdb)
> (gdb)
> (gdb) bt
> #0  0x00007ffff5daa165 in raise () from /lib/libc.so.6
> #1  0x00007ffff5dab580 in abort () from /lib/libc.so.6
> #2  0x00007ffff5da3201 in __assert_fail () from /lib/libc.so.6
> #3  0x0000000000436f7e in kvm_mutex_unlock ()
>     at 
> /var/tmp/portage/app-emulation/qemu-kvm-0.14.0/work/qemu-kvm-0.14.0/qemu-kvm.c:1724
> #4  qemu_mutex_unlock_iothread ()
>     at 
> /var/tmp/portage/app-emulation/qemu-kvm-0.14.0/work/qemu-kvm-0.14.0/qemu-kvm.c:1737
> #5  0x00000000005e84ee in qxl_hard_reset (d=0x15d3080, loadvm=0)
>     at 
> /var/tmp/portage/app-emulation/qemu-kvm-0.14.0/work/qemu-kvm-0.14.0/hw/qxl.c:665
> #6  0x00000000005e9f9a in ioport_write (opaque=0x15d3080, addr=<value
> optimized out>, val=0)
>     at 
> /var/tmp/portage/app-emulation/qemu-kvm-0.14.0/work/qemu-kvm-0.14.0/hw/qxl.c:979
> #7  0x0000000000439d4e in kvm_handle_io (env=0x11a3e00)
>     at 
> /var/tmp/portage/app-emulation/qemu-kvm-0.14.0/work/qemu-kvm-0.14.0/kvm-all.c:818
> #8  kvm_run (env=0x11a3e00)
>     at 
> /var/tmp/portage/app-emulation/qemu-kvm-0.14.0/work/qemu-kvm-0.14.0/qemu-kvm.c:617
> #9  0x0000000000439f79 in kvm_cpu_exec (env=0x764b)
>     at 
> /var/tmp/portage/app-emulation/qemu-kvm-0.14.0/work/qemu-kvm-0.14.0/qemu-kvm.c:1233
> #10 0x000000000043b2d7 in kvm_main_loop_cpu (_env=0x11a3e00)
>     at 
> /var/tmp/portage/app-emulation/qemu-kvm-0.14.0/work/qemu-kvm-0.14.0/qemu-kvm.c:1419
> #11 ap_main_loop (_env=0x11a3e00)
>     at 
> /var/tmp/portage/app-emulation/qemu-kvm-0.14.0/work/qemu-kvm-0.14.0/qemu-kvm.c:1466
> #12 0x00007ffff77bb944 in start_thread () from /lib/libpthread.so.0
> #13 0x00007ffff5e491dd in clone () from /lib/libc.so.6
> (gdb)

That's a spice bug. In fact, there are a lot of
qemu_mutex_lock/unlock_iothread in that subsystem. I bet at least a few
of them can cause even more subtle problems.

Two general issues with dropping the global mutex like this:
 - The caller of mutex_unlock is responsible for maintaining
   cpu_single_env across the unlocked phase (that's related to the
   abort above).
 - Dropping the lock in the middle of a callback is risky. That may
   enable re-entrances of code sections that weren't designed for this
   (I'm skeptic about the side effects of
   qemu_spice_vm_change_state_handler - why dropping the lock here?).

Spice requires a careful review regarding such issues. Or it should
pioneer with introducing its own lock so that we can handle at least
related I/O activities over the VCPUs without holding the global mutex
(but I bet it's not the simplest candidate for such a new scheme).

Jan

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to