On Fri, Nov 21, 2008 at 4:55 PM, Glauber Costa <[EMAIL PROTECTED]> wrote:
> On Thu, Nov 20, 2008 at 02:56:00AM +0100, Marcelo Tosatti wrote:
>>
>> On Wed, Nov 19, 2008 at 10:53:27PM +0100, Roland Lammel wrote:
>> > Actually it just happenend again with the host running kvm-79. Host
>> > CPU is at 100% but I'm still able to login (it recovered from the
>> > first hang). But I'm not able to start e.g. top.
>> > Writing to disk works (e.g. dd /dev/zero to /tmp/test.file with 1MB,
>> > 1G already caused the instance to hang)
>> >
>> > In the guest I see:
>> >
>> > The soft lockup of CPU#0 (only 1 cpu assigned to the guest) seems to
>> > be either caused by or cause itself the clock problem
>> >
>> > bit:~# date
>> > Fri Dec  6 13:50:40 CET 1912
>> >
>> > [   57.348217] eth1: no IPv6 routers present
>> > [1266956800.037898] BUG: soft lockup - CPU#0 stuck for 1179869795s!
>>
>> Funny. Glauber, Gerd?
> So, can you provide a more informative dmesg? It doesn't need to be a full 
> dmesg,
> but something more than these two messages would help. Specially because they
> have printk timestamps on it. It seems to me that our sched_clock went crazy,
> since the timestamp in the second printk is so much bigger than the first,
> and never changes after that.
>

Of course, I have 3 guests running all the same guest configuration
(debian 32bit), I'll enable now a 4th guest with debian 64 to see if
that makes any difference. Hosts usually crash within 12-48 hours
(although one is running for 50 hours right now).

Attached is the full dmesg of the host-system and the guest (kern.log)
which logged the CPU lockup.

> Do you have an older version of both host/guest in which it used to work?

Actually not, I just started out with KVM, as I'm used to use Xen
until now, but not on this particular maschine.

>>
>> > [logcheck:23795]
>> > [1266956800.037898] Modules linked in: ipv6 dm_snapshot dm_mirror
>> > dm_log dm_mod loop virtio_balloon serio_raw snd_pcsp virtio_net
>> > snd_pcm snd_timer snd soundcore psmouse snd_page_alloc evdev ext3 jbd
>> > mbcache ide_cd_mod cdrom ata_generic libata scsi_mod dock
>> > ide_pci_generic virtio_blk uhci_hcd usbcore piix ide_core virtio_pci
>> > thermal_sys
>> > [1266956800.037898]
>> > [1266956800.037898] Pid: 23795, comm: logcheck Not tainted (2.6.26-1-486 
>> > #1)
>> > [1266956800.037898] EIP: 0060:[<c0118c28>] EFLAGS: 00000246 CPU: 0
>> > [1266956800.037898] EIP is at finish_task_switch+0x20/0x78
>> > [1266956800.037898] EAX: c03cb620 EBX: de82e800 ECX: 00000003 EDX: de824000
>> > [1266956800.037898] ESI: 00000000 EDI: de824000 EBP: 00000000 ESP: c51e1f9c
>> > [1266956800.037898]  DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068
>> > [1266956800.037898] CR0: 8005003b CR2: 088a711c CR3: 04f08000 CR4: 00000690
>> > [1266956800.037898] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
>> > [1266956800.037898] DR6: ffff0ff0 DR7: 00000400
>> > [1266956800.037898]  [<c0118e22>] schedule_tail+0xe/0x39
>> > [1266956800.037898]  [<c0103646>] ret_from_fork+0x6/0x20
>> > [1266956800.037898]  =======================
>> > [1266957741.780134] INFO: task postdrop:24584 blocked for more than 120 
>> > seconds.
>> > [1266957741.780592] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>> > disables this message.
>> > [1266957741.781276] postdrop      D c014f55e     0 24584  24583
>> > [1266957741.781696]        de894000 00000086 00000000 c014f55e
>> > 49506ad8 1195233c de89418c 00013315
>> > [1266957741.782256]        00000000 bf229803 bf229803 d41593ec
>> > ddb2d400 c02a5b89 c03ec750 ddbf319c
>> > [1266957741.783040]        bf229803 c0121fc7 de894000 c03ec700
>> > c02a5b84 74736f70 706f7264 642d7000
>> >
>> > Clock sources used are (for host and guest):
>> > host:~# cat 
>> > /sys/devices/system/clocksource/clocksource0/current_clocksource
>> > acpi_pm
>> > host:~# cat 
>> > /sys/devices/system/clocksource/clocksource0/available_clocksource
>> > acpi_pm jiffies tsc
>> >
>> > guest:~# cat 
>> > /sys/devices/system/clocksource/clocksource0/current_clocksource
>> > kvm-clock
>> > guest:~# cat 
>> > /sys/devices/system/clocksource/clocksource0/available_clocksource
>> > kvm-clock jiffies tsc
>> > bit:~#
>> >
>> > Commandline for starting is (kvm-79):
>> > /usr/local/bin/qemu-system-x86_64 -S -M pc -m 500 -smp 1 -name bit
>> > -monitor pty -no-acpi -boot c -drive
>> > file=/var/kvm/bit.img,if=virtio,index=0,boot=on -net
>> > nic,macaddr=24:42:53:21:52:45,vlan=0,model=virtio -net
>> > tap,fd=11,script=,vlan=0,ifname=vnet0 -serial tcp:127.0.0.1:50401
>> > -parallel none -usb -vnc 0.0.0.0:45001
>>
>> Why are you using -no-acpi? Perhaps switch the guest to acpi_pm to isolate
>> kvm-clock issues?

I've changed one guest to use acpi and use the acpi_pm clock source.

Running now with:
/usr/local/bin/qemu-system-x86_64 -S -M pc -m 500 -smp 1 -name bit
-monitor pty -no-acpi -boot c -drive
file=/var/kvm/bit.img,if=virtio,index=0,boot=on -net
nic,macaddr=24:42:53:21:52:45,vlan=0,model=virtio -net
tap,fd=11,script=,vlan=0,ifname=vnet0 -serial tcp:127.0.0.1:50401
-parallel none -usb -vnc 0.0.0.0:45001

Actually I had similar problems when using kvm-72 and acpi, and after
switching to libvirt for configuration it automatically changed to
no-acpi which I just left that way to try it. I've read mixed
recommendaction concerning ACPI.

So should acpi be enabled generelly (for modern kernels and systems)?
Should ntpd be running on the guests, or just the host when using acpi_pm?


Cheers and thanks

+rl

-- 
Roland Lammel
QuikIT - IT Lösungen - flexibel und schnell
Web: http://www.quikit.at
Email: [EMAIL PROTECTED]

"Enjoy your job, make lots of money, work within the law. Choose any two."

Attachment: dmesg_host.gz
Description: GNU Zip compressed data

Attachment: kern.log.gz
Description: GNU Zip compressed data

Reply via email to