Hello, Just an update on this, I've disabled HyperThreading in the bios for this machine and this appears to have resolved the crashing.
So it seems there is a bug here with Hyperthreading, and the 686 kernel with PAE enabled.. Might be worth trying to track this down a bit further. Cheers, Blair On Mon, Jan 23, 2012 at 9:42 AM, Blair Harrison <[email protected]> wrote: > Hello, > > I've got a server which has now crashed a few times in a similar > fashion, even tried moving to new hardware with similar effect (tho on > the new hardware this seems to be happening more frequently), so this > seems likely some interaction between nagios and the kernel causing a > soft lock. Any ideas on how to resolve this would be appreciated. > Unfortunately this is the only log I have of the event, the first > event didn't produce any output like this, and I haven't got a record > of the logs from the previous hardware as I thought they may have been > isolated incidents. > > The previous hardware was running Lenny rather than Squeeze, so this > seems not isolated to just one version of anything in particular. > > Let me know if there is any more information which would be of use. > > There's quite a few bits of software running on here, RTG, Cricket, > Nagios, smokeping, rancid > > Debian 6.0.3 > > Linux zzz-zzz 2.6.32-5-686-bigmem #1 SMP Wed Jan 11 13:17:56 UTC 2012 > i686 GNU/Linux > > Jan 22 22:40:40 zzz-zzz kernel: [176617.648985] BUG: soft lockup - > CPU#13 stuck for 61s! [nagios3:2070] > Jan 22 22:40:40 zzz-zzz kernel: [176617.649040] Modules linked in: > netconsole configfs joydev usbhid hid xt_multiport iptable_filter > ip_tables x_tables 8021q garp stp loop snd_pcm snd_timer snd soundcore > snd_page_alloc ioatdma pcspkr evdev cdc_ether usbnet button processor > serio_raw dca mii shpchp pci_hotplug i2c_i801 i2c_core ext4 mbcache > jbd2 crc16 raid10 md_mod sd_mod crc_t10dif ata_generic uhci_hcd > megaraid_sas ata_piix ehci_hcd libata usbcore scsi_mod nls_base > thermal bnx2 thermal_sys [last unloaded: netconsole] > Jan 22 22:40:40 zzz-zzz kernel: [176617.649078] > Jan 22 22:40:40 zzz-zzz kernel: [176617.649082] Pid: 2070, comm: > nagios3 Not tainted (2.6.32-5-686-bigmem #1) System x3550 M3 > -[7944D2M]- > Jan 22 22:40:40 zzz-zzz kernel: [176617.649085] EIP: 0060:[<c10249bb>] > EFLAGS: 00000202 CPU: 13 > Jan 22 22:40:40 zzz-zzz kernel: [176617.649094] EIP is at > native_flush_tlb_others+0x85/0xa6 > Jan 22 22:40:40 zzz-zzz kernel: [176617.649096] EAX: 00000282 EBX: > c14661ac ECX: c10200d8 EDX: 00000020 > Jan 22 22:40:40 zzz-zzz kernel: [176617.649099] ESI: 00000005 EDI: > 00000140 EBP: c14661a0 ESP: ee4c9a3c > Jan 22 22:40:40 zzz-zzz kernel: [176617.649101] DS: 007b ES: 007b FS: > 00d8 GS: 00e0 SS: 0068 > Jan 22 22:40:40 zzz-zzz kernel: [176617.649104] CR0: 8005003b CR2: > b758a376 CR3: 2eb7e000 CR4: 000006f0 > Jan 22 22:40:40 zzz-zzz kernel: [176617.649106] DR0: 00000000 DR1: > 00000000 DR2: 00000000 DR3: 00000000 > Jan 22 22:40:40 zzz-zzz kernel: [176617.649108] DR6: ffff0ff0 DR7: 00000400 > Jan 22 22:40:40 zzz-zzz kernel: [176617.649110] Call Trace: > Jan 22 22:40:40 zzz-zzz kernel: [176617.649116] [<c1024aa3>] ? > flush_tlb_page+0x5d/0x65 > Jan 22 22:40:40 zzz-zzz kernel: [176617.649120] [<c1023e90>] ? > ptep_set_access_flags+0x59/0x63 > Jan 22 22:40:40 zzz-zzz kernel: [176617.649125] [<c10a1040>] ? > do_wp_page+0x3b9/0x7dd > Jan 22 22:40:40 zzz-zzz kernel: [176617.649131] [<c1031770>] ? > finish_task_switch+0x76/0x95 > Jan 22 22:40:40 zzz-zzz kernel: [176617.649135] [<c10b61a0>] ? > kmem_cache_free+0x78/0xaf > Jan 22 22:40:40 zzz-zzz kernel: [176617.649138] [<c1031770>] ? > finish_task_switch+0x76/0x95 > Jan 22 22:40:40 zzz-zzz kernel: [1766Jan 23 07:13:24 zzz-zzz > syslog-ng[1807]: syslog-ng starting up; version='3.1.3' > > Cheers, > Blair -- To UNSUBSCRIBE, email to [email protected] with a subject of "unsubscribe". Trouble? Contact [email protected] Archive: http://lists.debian.org/cahn0gtsh9fcf5te6bsmnqr5u8wfpxkkzx+5osyeykom+sx3...@mail.gmail.com

