Somewhere between kernel 3.2 and 3.11 on my VM hosts (yes, I know that narrows
it down a /whole lot/ ...), live migration started killing my Ubuntu precise
(kernel 3.2.x) guests, causing all of their vcpus to go into a busy loop. Once
(and only once) I've observed the guest eventually becoming responsive again,
with a clock nearly 600 years in the future and a negative uptime.
I haven't been able to dig up any previous threads about this problem, so my
gut instinct is that I've configured something wonky. Any pointers toward
/what/ I may have done wrong are appreciated.
It only seems to happen if I've given the guests Nehalem-class CPU features.
My longest-running VMs, from before I started passing-through the CPU
capabilities into the guest, seem to migrate without issue.
It also seems to happen reliably when the guest has been running for a while;
it's easily reproducible with guests that have been up ~1 day, and I've
reproduced it in VMs with an uptime of ~20 hours. I haven't yet figured out a
lower-bound, which makes the testing cycle a little longer for me.
The guests that I reliably reproduce this on are Ubuntu 12.04 guests running
the current 3.2 kernel that Canonical distributes. Recent Fedora kernels
(3.14+, IIRC) don't seem to busy-spin this way, though I haven't tested this
case exhaustively, and I haven't written down very good notes for the tests I
have done with Fedora.
The hosts are dual-socket Nehalem Xeons (L5520), currently running Ubuntu 14.04
and the associated 3.13 kernel. I had previously reproduced this with 12.04
running a raring-backport 3.11 kernel as well, but I (seemingly erroneously)
assumed it may have been a qemu userspace discrepancy.
I have been poring through a debugger attached to the guest via qemu's
gdbserver after it gets sent in a busy-spin, and the stack trace is:
(gdb) bt
#0 second_overflow (secs=<optimized out>) at
/build/buildd/linux-3.2.0/kernel/time/ntp.c:407
#1 0xffffffff81095c75 in logarithmic_accumulation (offset=3831765322649889943,
shift=9) at /build/buildd/linux-3.2.0/kernel/time/timekeeping.c:987
#2 0xffffffff81096042 in update_wall_time () at
/build/buildd/linux-3.2.0/kernel/time/timekeeping.c:1056
#3 0xffffffff81096e8d in do_timer (ticks=549606) at
/build/buildd/linux-3.2.0/kernel/time/timekeeping.c:1246
#4 0xffffffff8109d825 in tick_do_update_jiffies64 (now=...) at
/build/buildd/linux-3.2.0/kernel/time/tick-sched.c:77
#5 0xffffffff8109dda6 in tick_nohz_update_jiffies (now=...) at
/build/buildd/linux-3.2.0/kernel/time/tick-sched.c:145
#6 0xffffffff8109e378 in tick_check_nohz (cpu=0) at
/build/buildd/linux-3.2.0/kernel/time/tick-sched.c:713
#7 tick_check_idle (cpu=0) at
/build/buildd/linux-3.2.0/kernel/time/tick-sched.c:731
#8 0xffffffff8106ff91 in irq_enter () at
/build/buildd/linux-3.2.0/kernel/softirq.c:306
#9 0xffffffff8166cef3 in smp_apic_timer_interrupt (regs=<optimized out>) at
/build/buildd/linux-3.2.0/arch/x86/kernel/apic/apic.c:880
#10 <signal handler called>
#11 0xffffffffffffff10 in ?? ()
(gdb) thread 2
[Switching to thread 2 (Thread 2)]
#0 read_seqbegin (sl=<optimized out>) at
/build/buildd/linux-3.2.0/include/linux/seqlock.h:89
89 /build/buildd/linux-3.2.0/include/linux/seqlock.h: No such file or
directory.
(gdb) bt
#0 read_seqbegin (sl=<optimized out>) at
/build/buildd/linux-3.2.0/include/linux/seqlock.h:89
#1 ktime_get () at /build/buildd/linux-3.2.0/kernel/time/timekeeping.c:268
#2 0xffffffff8109e355 in tick_check_nohz (cpu=1) at
/build/buildd/linux-3.2.0/kernel/time/tick-sched.c:709
#3 tick_check_idle (cpu=1) at
/build/buildd/linux-3.2.0/kernel/time/tick-sched.c:731
#4 0xffffffff8106ff91 in irq_enter () at
/build/buildd/linux-3.2.0/kernel/softirq.c:306
#5 0xffffffff8166cef3 in smp_apic_timer_interrupt (regs=<optimized out>) at
/build/buildd/linux-3.2.0/arch/x86/kernel/apic/apic.c:880
#6 <signal handler called>
#7 0xffffffffffffff10 in ?? ()
If I continue, then re-stop the guest, logarithmic_accumulation() is still in
the stacktrace, with the same offset and shift; the line numbers indicate it's
stuck in the following loop:
while (timekeeper.xtime_nsec >= nsecps) {
int leap;
timekeeper.xtime_nsec -= nsecps;
xtime.tv_sec++;
leap = second_overflow(xtime.tv_sec);
xtime.tv_sec += leap;
wall_to_monotonic.tv_sec -= leap;
if (leap)
clock_was_set_delayed();
}
Live migration is initiated through libvirt by virDomainMigrate with
flags=VIR_MIGRATE_LIVE, uri="tcp://$recv_hostname".
The guest is spawned by libvirtd with:
qemu-system-x86_64 -enable-kvm -name dog -S
-machine pc-i440fx-trusty,accel=kvm,usb=off
-cpu
Nehalem,+dca,+xtpr,+tm2,+est,+vmx,+ds_cpl,+monitor,+pbe,+tm,+ht,+ss,+acpi,+ds,+vme
-m 512 -realtime mlock=off -smp 2,sockets=2,cores=1,threads=1
-uuid 55fd4c19-2477-40a5-988f-aaccd60b20dc -no-user-config -nodefaults
-chardev
socket,id=charmonitor,path=/var/lib/libvirt/qemu/dog.monitor,server,nowait
-mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown
-boot menu=on,strict=on
-device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2
-drive if=none,id=drive-ide0-1-0,readonly=on,format=raw
-device ide-cd,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0,bootindex=1
-drive
file=rbd:rbd/dog:id=libvirt:key=________________________________________:auth_supported=cephx\;none,if=none,id=drive-virtio-disk0,format=raw,cache=none
-device
virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=2
-netdev tap,ifname=vm9_0,script=no,id=hostnet0,vhost=on,vhostfd=26
-device
virtio-net-pci,netdev=hostnet0,id=net0,mac=00:16:3e:62:7a:9d,bus=pci.0,addr=0x3
-vnc 0.0.0.0:9,password
-device cirrus-vga,id=video0,bus=pci.0,addr=0x2
-incoming tcp:[::]:49152
-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5
The libvirt domain XML is:
<domain type='kvm' id='12'>
<name>dog</name>
<uuid>55fd4c19-2477-40a5-988f-aaccd60b20dc</uuid>
<memory unit='KiB'>524288</memory>
<currentMemory unit='KiB'>524288</currentMemory>
<vcpu placement='static'>2</vcpu>
<resource>
<partition>/machine</partition>
</resource>
<os>
<type arch='x86_64' machine='pc-i440fx-trusty'>hvm</type>
<bootmenu enable='yes'/>
</os>
<features>
<acpi/>
</features>
<cpu mode='custom' match='exact'>
<model fallback='allow'>Nehalem</model>
<feature policy='require' name='dca'/>
<feature policy='require' name='xtpr'/>
<feature policy='require' name='tm2'/>
<feature policy='require' name='est'/>
<feature policy='require' name='vmx'/>
<feature policy='require' name='ds_cpl'/>
<feature policy='require' name='monitor'/>
<feature policy='require' name='pbe'/>
<feature policy='require' name='tm'/>
<feature policy='require' name='ht'/>
<feature policy='require' name='ss'/>
<feature policy='require' name='acpi'/>
<feature policy='require' name='ds'/>
<feature policy='require' name='vme'/>
</cpu>
<clock offset='utc'/>
<on_poweroff>destroy</on_poweroff>
<on_reboot>restart</on_reboot>
<on_crash>destroy</on_crash>
<devices>
<emulator>/usr/bin/kvm-spice</emulator>
<disk type='file' device='cdrom'>
<driver name='qemu' type='raw'/>
<target dev='hdc' bus='ide'/>
<readonly/>
<boot order='1'/>
<alias name='ide0-1-0'/>
<address type='drive' controller='0' bus='1' target='0' unit='0'/>
</disk>
<disk type='network' device='disk' snapshot='no'>
<driver name='qemu' type='raw' cache='none'/>
<auth username='libvirt'>
<secret type='ceph' uuid='e04aa789-0bd7-07ac-cf10-78d8f52a4162'/>
</auth>
<source protocol='rbd' name='rbd/dog'/>
<target dev='vda' bus='virtio'/>
<boot order='2'/>
<alias name='virtio-disk0'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x04'
function='0x0'/>
</disk>
<controller type='ide' index='0'>
<alias name='ide0'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x01'
function='0x1'/>
</controller>
<controller type='usb' index='0'>
<alias name='usb0'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x01'
function='0x2'/>
</controller>
<controller type='pci' index='0' model='pci-root'>
<alias name='pci.0'/>
</controller>
<interface type='ethernet'>
<mac address='00:16:3e:62:7a:9d'/>
<script path='no'/>
<target dev='vm9_0'/>
<model type='virtio'/>
<alias name='net0'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x03'
function='0x0'/>
</interface>
<input type='mouse' bus='ps2'/>
<input type='keyboard' bus='ps2'/>
<graphics type='vnc' port='5909' autoport='no' listen='0.0.0.0'>
<listen type='address' address='0.0.0.0'/>
</graphics>
<video>
<model type='cirrus' vram='9216' heads='1'/>
<alias name='video0'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x02'
function='0x0'/>
</video>
<memballoon model='virtio'>
<alias name='balloon0'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x05'
function='0x0'/>
</memballoon>
</devices>
<seclabel type='none'/>
</domain>
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html