Sorry to repost under a different topic again, but it fits far better here.
I saw similar issues when running from a debian lenny 2.6.26-1-amd64 64bit kvm host (which is kvm72 on currently) and the guests are debian lenny 2.6.26-1-486 32bit. So the setup is similar to the Ubuntu setup. I have configured ntpd in the host system and the guest systems, but of course ntpd crashes after that severe clock jump. The problem shows exactly the same systems, but the system is able to recover from time to time, which allowed me to see the actual cause of the problem, which seems to be a severe backward time jump (it is mostly somerwhere in Nov 1912, so it seems to be correlated as a backward shift form the current time (e.g. int overflow) which causes the VM to hang. In case it is able to recover I saw a very big clock jump (for the kernel timer it is a forward jump but it seems to cause the system clock to be in Nov 1912). Nov 12 20:56:03 bit kernel: [ 38.061596] warning: `ntpd' uses 32-bit capabilities (legacy support in use) Nov 13 06:25:03 bit kernel: imklog 3.18.2, log source = /proc/kmsg started. Nov 30 06:25:48 bit kernel: imklog 3.18.2, log source = /proc/kmsg started. Nov 30 06:25:48 bit kernel: imklog 3.18.2, log source = /proc/kmsg started. Nov 30 06:25:51 bit kernel: [1266940721.901855] INFO: task postdrop:19268 blocked for more than 120 seconds. Nov 30 06:25:51 bit kernel: [1266940721.902793] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Nov 30 06:25:51 bit kernel: [1266940721.905843] postdrop D c014f55e 0 19268 19267 Nov 30 06:25:51 bit kernel: [1266940721.906697] dd8f9c00 00000086 00000000 c014f55e 54541a81 1194f8cd dd8f9d8c 00015f63 Nov 30 06:25:51 bit kernel: [1266940721.907799] 00000000 be709d78 be709d78 c657c3c4 dec3b400 c02a5b89 dd92ded4 dda43ed4 Nov 30 06:25:51 bit kernel: [1266940721.908245] be709d78 c0121fc7 dd8f9c00 c03ec700 c02a5b84 74736f70 706f7264 642d7000 Nov 30 06:25:51 bit kernel: [1266940721.909838] Call Trace: Nov 30 06:25:51 bit kernel: [1266940721.910957] [<c014f55e>] write_cache_pages+0x227/0x26d Nov 30 06:25:51 bit kernel: [1266940721.911801] [<c02a5b89>] schedule_timeout+0x69/0x86 Nov 30 06:25:51 bit kernel: [1266940721.912646] [<c0121fc7>] process_timeout+0x0/0x5 Nov 30 06:25:51 bit kernel: [1266940721.913463] [<c02a5b84>] schedule_timeout+0x64/0x86 Nov 30 06:25:51 bit kernel: [1266940721.914288] [<e00852e4>] journal_stop+0x7d/0x12b [jbd] Nov 30 06:25:51 bit kernel: [1266940721.915134] [<c017bfcd>] __writeback_single_inode+0x13f/0x231 Nov 30 06:25:51 bit kernel: [1266940721.916017] [<c014f5ee>] do_writepages+0x29/0x30 Nov 30 06:25:51 bit kernel: [1266940721.916834] [<c014ace8>] __filemap_fdatawrite_range+0x65/0x70 Nov 30 06:25:51 bit kernel: [1266940721.917722] [<e00fbeab>] ext3_sync_file+0x87/0x9c [ext3] Nov 30 06:25:51 bit kernel: [1266940721.918580] [<c017e6f0>] do_fsync+0x3d/0x7e Nov 30 06:25:51 bit kernel: [1266940721.919356] [<c017e74e>] __do_fsync+0x1d/0x2b Nov 30 06:25:51 bit kernel: [1266940721.920142] [<c010372f>] sysenter_past_esp+0x78/0xb9 Nov 30 06:25:51 bit kernel: [1266940721.920993] ======================= The guest is not really usable anymore as all diskio (mostly write but also read) tend to hang the system completly. I now manually compiled kvm-79 (including the kernel modules) and am running from it with 3 instances now, non of them has crashed so far, but it's only 20 hours so far. For me the ping check is actually enough to detect if the host is ok, and I'll probably use mon or something similar to just shutdown and restart the instance. >From what I remember kvm_stat suddenly dropped all counters to 0 (chance increase with heavy disk io) only minor block device activity. I'll try to reproduce and provide the kvm_stat output too. Cheers +rl On Tue, Nov 18, 2008 at 10:34 PM, Marcelo Tosatti <[EMAIL PROTECTED]> wrote: > On Fri, Nov 14, 2008 at 03:34:57PM +0000, Chris Jones wrote: >> I've have setup a couple virtual machines and they work great... for anywhere >> between 2-24 hours. But then, for no reason I can determine, they just go >> 100% >> busy and stop responding. > > Hi Chris, > > Can you please reproduce with kvm-79 and provide "kvm_stat -l" > (kvm-79/kvm_stat) output (for 10s or so). > > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- Roland Lammel QuikIT - IT Lösungen - flexibel und schnell Web: http://www.quikit.at Email: [EMAIL PROTECTED] "Enjoy your job, make lots of money, work within the law. Choose any two." -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
