I sent the email below to smartos-discuss a week ago but haven't gotten a
response yet.  Since I suspect this issue applies to KVM on Illumos in
general, I'm sending a slightly modified copy of this email to this list as
well.  Please let me know if there's any additional information I can
provide, or if there are any diagnostics I should run (e.g. DTrace scripts)
next time this problem happens.

I ran into some issues last week with two of my KVM VMs running on SmartOS
20120629T002039Z becoming temporarily becoming unresponsive.  I believe the
same problem happened again later on that week, although I haven't been able
to confirm it.  One of the VMs is running Debian Linux Wheezy, and the other
Windows Server 2008 R2.  The Debian Linux VM had become unresponsive while
performing apt-get dist-upgrade.  When the VMs were unresponsive, executing
"vmadm info" would just hang.  Each qemu process would consume 100% of one
CPU while they were hanging.  Eventually both VMs came back up (not at the
same time) without requiring a reboot, at which point the loads went back to
normal.  At one point, after the Linux VM had gotten back up, running vmadm
info on the Windows VM would actually return a response, but it would say:
"Unable to get VM info for <uuid>: Unable to get info from vmadmd, query
returned 500."  Eventually this error message went away.  I've had VMs
running on this machine for a few weeks now and I haven't run into this
problem until now (to my knowledge).

/var/adm/messages showed some messages during the time that the VMs froze:
2012-07-16T16:14:54.776339+00:00 virt1 kvm: [ID 391722 kern.info] unhandled
wrmsr: 0x0 data 0
2012-07-16T16:14:54.776362+00:00 virt1 kvm: [ID 713435 kern.info] unhandled
rdmsr: 0x50866b
2012-07-16T16:14:54.776374+00:00 virt1 kvm: [ID 391722 kern.info] unhandled
wrmsr: 0x0 data 0
2012-07-16T16:14:54.776643+00:00 virt1 kvm: [ID 713435 kern.info] unhandled
rdmsr: 0xfe75e1c0
2012-07-16T16:14:54.776656+00:00 virt1 kvm: [ID 391722 kern.info] unhandled
wrmsr: 0x0 data 0
2012-07-16T16:14:54.776669+00:00 virt1 kvm: [ID 291337 kern.info] vcpu 1
received sipi with vector # 10
2012-07-16T16:14:54.776673+00:00 virt1 kvm: [ID 420667 kern.info]
kvm_lapic_reset: vcpu=ffffff04eb294000, id=1, base_msr= fee00800 PRIx64
base_address=fee00000
2012-07-16T16:25:38.233767+00:00 virt1 kvm: [ID 713435 kern.info] unhandled
rdmsr: 0xff29b633
2012-07-16T16:25:38.233795+00:00 virt1 kvm: [ID 391722 kern.info] unhandled
wrmsr: 0x0 data 0
2012-07-16T16:25:38.233862+00:00 virt1 kvm: [ID 713435 kern.info] unhandled
rdmsr: 0x50866b
2012-07-16T16:25:38.233876+00:00 virt1 kvm: [ID 391722 kern.info] unhandled
wrmsr: 0x0 data 0
2012-07-16T16:25:38.234174+00:00 virt1 kvm: [ID 713435 kern.info] unhandled
rdmsr: 0xfe75ea30
2012-07-16T16:25:38.234191+00:00 virt1 kvm: [ID 391722 kern.info] unhandled
wrmsr: 0x3 data 0
2012-07-16T16:25:38.262789+00:00 virt1 kvm: [ID 713435 kern.info] unhandled
rdmsr: 0x50866b
2012-07-16T16:25:38.262824+00:00 virt1 kvm: [ID 391722 kern.info] unhandled
wrmsr: 0x0 data 0
2012-07-16T16:25:38.262874+00:00 virt1 kvm: [ID 713435 kern.info] unhandled
rdmsr: 0x50866b
2012-07-16T16:25:38.262902+00:00 virt1 kvm: [ID 391722 kern.info] unhandled
wrmsr: 0x0 data 0
2012-07-16T16:25:38.263185+00:00 virt1 kvm: [ID 713435 kern.info] unhandled
rdmsr: 0xfe75ea30
2012-07-16T16:25:38.263203+00:00 virt1 kvm: [ID 391722 kern.info] unhandled
wrmsr: 0x3 data 0 .
2012-07-16T16:31:45+00:00 virt1 stop-F[3132]: [ID 702911 local0.error]
45.608 connect ECONNREFUSED .

The server has a Xeon E3-1230 CPU on a SuperMicro X9SCM-F motherboard.

Alex



-------------------------------------------
illumos-discuss
Archives: https://www.listbox.com/member/archive/182180/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182180/21175430-2e6923be
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=21175430&id_secret=21175430-6a77cda4
Powered by Listbox: http://www.listbox.com

Reply via email to