Sorry to repost under a different topic again, but it fits far better here.

I saw similar issues when running from a debian lenny 2.6.26-1-amd64
64bit kvm host (which is kvm72  on currently) and the guests are
debian lenny 2.6.26-1-486 32bit. So the setup is similar to the Ubuntu setup.

I have configured ntpd in the host system and the guest systems, but
of course ntpd crashes after that severe clock jump.

The problem shows exactly the same systems, but the system is able to
recover from time to time, which allowed me to see the actual cause of
the problem, which seems to be a severe backward time jump (it is
mostly somerwhere in Nov 1912, so it seems to be correlated as a
backward shift form the current time (e.g. int overflow) which causes
the VM to hang.

In case it is able to recover I saw a very big clock jump (for the
kernel timer it is a forward jump but it seems to cause the system
clock to be in Nov 1912).
Nov 12 20:56:03 bit kernel: [   38.061596] warning: `ntpd' uses 32-bit
capabilities (legacy support in use)
Nov 13 06:25:03 bit kernel: imklog 3.18.2, log source = /proc/kmsg started.
Nov 30 06:25:48 bit kernel: imklog 3.18.2, log source = /proc/kmsg started.
Nov 30 06:25:48 bit kernel: imklog 3.18.2, log source = /proc/kmsg started.
Nov 30 06:25:51 bit kernel: [1266940721.901855] INFO: task
postdrop:19268 blocked for more than 120 seconds.
Nov 30 06:25:51 bit kernel: [1266940721.902793] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 30 06:25:51 bit kernel: [1266940721.905843] postdrop      D
c014f55e     0 19268  19267
Nov 30 06:25:51 bit kernel: [1266940721.906697]        dd8f9c00
00000086 00000000 c014f55e 54541a81 1194f8cd dd8f9d8c 00015f63
Nov 30 06:25:51 bit kernel: [1266940721.907799]        00000000
be709d78 be709d78 c657c3c4 dec3b400 c02a5b89 dd92ded4 dda43ed4
Nov 30 06:25:51 bit kernel: [1266940721.908245]        be709d78
c0121fc7 dd8f9c00 c03ec700 c02a5b84 74736f70 706f7264 642d7000
Nov 30 06:25:51 bit kernel: [1266940721.909838] Call Trace:
Nov 30 06:25:51 bit kernel: [1266940721.910957]  [<c014f55e>]
write_cache_pages+0x227/0x26d
Nov 30 06:25:51 bit kernel: [1266940721.911801]  [<c02a5b89>]
schedule_timeout+0x69/0x86
Nov 30 06:25:51 bit kernel: [1266940721.912646]  [<c0121fc7>]
process_timeout+0x0/0x5
Nov 30 06:25:51 bit kernel: [1266940721.913463]  [<c02a5b84>]
schedule_timeout+0x64/0x86
Nov 30 06:25:51 bit kernel: [1266940721.914288]  [<e00852e4>]
journal_stop+0x7d/0x12b [jbd]
Nov 30 06:25:51 bit kernel: [1266940721.915134]  [<c017bfcd>]
__writeback_single_inode+0x13f/0x231
Nov 30 06:25:51 bit kernel: [1266940721.916017]  [<c014f5ee>]
do_writepages+0x29/0x30
Nov 30 06:25:51 bit kernel: [1266940721.916834]  [<c014ace8>]
__filemap_fdatawrite_range+0x65/0x70
Nov 30 06:25:51 bit kernel: [1266940721.917722]  [<e00fbeab>]
ext3_sync_file+0x87/0x9c [ext3]
Nov 30 06:25:51 bit kernel: [1266940721.918580]  [<c017e6f0>] do_fsync+0x3d/0x7e
Nov 30 06:25:51 bit kernel: [1266940721.919356]  [<c017e74e>]
__do_fsync+0x1d/0x2b
Nov 30 06:25:51 bit kernel: [1266940721.920142]  [<c010372f>]
sysenter_past_esp+0x78/0xb9
Nov 30 06:25:51 bit kernel: [1266940721.920993]  =======================

The guest is not really usable anymore as all diskio (mostly write but
also read) tend to hang the system completly.

I now manually compiled kvm-79 (including the kernel modules) and am
running from it with 3 instances now, non of them has crashed so far,
but it's only 20 hours so far.

For me the ping check is actually enough to detect if the host is ok,
and I'll probably use mon or something similar to just shutdown and
restart the instance.

>From what I remember kvm_stat suddenly dropped all counters to 0
(chance increase with heavy disk io) only minor block device activity.
I'll try to reproduce and provide the kvm_stat output too.

Cheers

+rl


On Tue, Nov 18, 2008 at 10:34 PM, Marcelo Tosatti <[EMAIL PROTECTED]> wrote:
> On Fri, Nov 14, 2008 at 03:34:57PM +0000, Chris Jones wrote:
>> I've have setup a couple virtual machines and they work great... for anywhere
>> between 2-24 hours.  But then, for no reason I can determine, they just go 
>> 100%
>> busy and stop responding.
>
> Hi Chris,
>
> Can you please reproduce with kvm-79 and provide "kvm_stat -l"
> (kvm-79/kvm_stat) output (for 10s or so).
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

-- 
Roland Lammel
QuikIT - IT Lösungen - flexibel und schnell
Web: http://www.quikit.at
Email: [EMAIL PROTECTED]

"Enjoy your job, make lots of money, work within the law. Choose any two."
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to