Ciao Marcelo, sorry for getting back so late. Thanks for your patience. :-)
Marcelo Tosatti schrieb:
>> I'm running a manually compiled KVM on CentOS 5.4. The KVM installation
>> has been carried over from CentOS 5.3, when KVM wasn't distributed with
>> the OS. (I tried to migrate to CentOS 5.4 native KVM support, but wasn't
>> able to get along with RedHat's interpretation of KVM.)
>>
>> The KVM version used is 88, on Kernel 2.6.18-128.7.1.el5, as KVM doesn't
>> seem to compile on CentOS' current 2.6.18-164.9.1.el5.
>>
>> Only on CentOS guests, I see very frequent "soft lockup" messages and
>> excessively hanging KVM instances.
>
> Can you please share some of the soft lockup messages.
>
> And how exactly are the VMs hanging?
They are unresponsive for a few seconds. More "hiccuping" than hanging.
It appears to be I/O-related in some way, because it happens most
frequently when I do things on the file system.
Dmesg is full of these:
BUG: soft lockup - CPU#0 stuck for 10s! [kblockd/0:10]
Pid: 10, comm: kblockd/0
EIP: 0060:[<c056f931>] CPU: 0
EIP is at ide_outb+0x4/0x5
EFLAGS: 00000202 Not tainted (2.6.18-164.6.1.el5 #1)
EAX: 00000001 EBX: c07e2f80 ECX: 00000286 EDX: 0000c000
ESI: 00000011 EDI: 00000000 EBP: c07e3014 DS: 007b ES: 007b
CR0: 8005003b CR2: b7f3c000 CR3: 12122000 CR4: 000006d0
[<c0573cab>] ide_dma_start+0x22/0x2e
[<c0576474>] ide_do_rw_disk+0x3b2/0x4a6
[<c056de34>] ide_do_request+0x533/0x6bf
[<c04de1b9>] freed_request+0x1d/0x37
[<c056d8d0>] ide_end_request+0xcc/0xd4
[<c056e221>] ide_intr+0x167/0x190
[<c044da39>] handle_IRQ_event+0x45/0x8c
[<c044db04>] __do_IRQ+0x84/0xd6
[<c044da80>] __do_IRQ+0x0/0xd6
[<c04074b2>] do_IRQ+0x99/0xc3
[<c0405946>] common_interrupt+0x1a/0x20
[<c04291ab>] __do_softirq+0x57/0x114
[<c04073cf>] do_softirq+0x52/0x9c
[<c04059d7>] apic_timer_interrupt+0x1f/0x24
[<c056f931>] ide_outb+0x4/0x5
[<c0573cab>] ide_dma_start+0x22/0x2e
[<c0576474>] ide_do_rw_disk+0x3b2/0x4a6
[<c056de34>] ide_do_request+0x533/0x6bf
[<c04e710f>] cfq_kick_queue+0x70/0x80
[<c0431e8a>] run_workqueue+0x78/0xb5
[<c04e709f>] cfq_kick_queue+0x0/0x80
[<c043273e>] worker_thread+0xd9/0x10b
[<c041e727>] default_wake_function+0x0/0xc
[<c0432665>] worker_thread+0x0/0x10b
[<c0434b55>] kthread+0xc0/0xeb
[<c0434a95>] kthread+0x0/0xeb
[<c0405c53>] kernel_thread_helper+0x7/0x10
=======================
>> Long-term stability is fine (several months uptime), but disturbed
>> by the hangs. The problem already was there on CentOS 5.3 as well.
>> With the Debian guests on the same host, I have never had any apparent
>> problems.
>
> Questions:
>
> - Is there significant swapping on the host?
> - Are you migrating vm's?
No migration and no swap activity. The host has plenty of idle RAM:
[r...@zulu ~]# free -m
total used free shared buffers cached
Mem: 7987 7904 82 0 667 5101
-/+ buffers/cache: 2135 5851
Swap: 1983 0 1983
>> A number of google results suggest that I should work with CPU scaling
>> on the CentOS guest systems, but unfortunately, CPU scaling is not
>> available in my guests. So, here's my question: How do I enable CPU
>> scaling in KVM guests? Or is there any other measure against these soft
>> lockups that you can recommend?
>
> What probably was suggested is to disable cpu frequency scaling on the
> host. Please provide more details on the host system.
Host is a Quadcore Xeon HP DL320 G5 with CentOS 5.4, old Kernel
2.6.18-128.7.1.el5.
There are no hints toward CPU scaling in /sys/devices/system/ on the host:
[r...@zulu ~]# ls -l /sys/devices/system/cpu/cpu0
total 0
drwxr-xr-x 5 root root 0 Nov 7 13:47 cache
-r-------- 1 root root 4096 Jan 4 14:55 crash_notes
drwxr-xr-x 2 root root 0 Nov 7 13:48 topology
The file "Crash Notes" contains the following number: 22792b400
Thanks for your help,
-martin
--
Martin Schmitt - Schmitt Systemberatung - http://www.scsy.de
DE 35415 Pohlheim, Gießener Str. 18
DE 65307 Bad Schwalbach, Am Bräunchesberg 9
Linux/UNIX - Internet - E-Mail Infrastructure - Antispam/Antivirus
- "What goes up, must come down. Ask any system administrator." -
signature.asc
Description: OpenPGP digital signature
