Hello,

Just an update on this, I've disabled HyperThreading in the bios for
this machine and this appears to have resolved the crashing.

So it seems there is a bug here with Hyperthreading, and the 686
kernel with PAE enabled.. Might be worth trying to track this down a
bit further.

Cheers,
Blair


On Mon, Jan 23, 2012 at 9:42 AM, Blair Harrison <[email protected]> wrote:
> Hello,
>
> I've got a server which has now crashed a few times in a similar
> fashion, even tried moving to new hardware with similar effect (tho on
> the new hardware this seems to be happening more frequently), so this
> seems likely some interaction between nagios and the kernel causing a
> soft lock. Any ideas on how to resolve this would be appreciated.
> Unfortunately this is the only log I have of the event, the first
> event didn't produce any output like this, and I haven't got a record
> of the logs from the previous hardware as I thought they may have been
> isolated incidents.
>
> The previous hardware was running Lenny rather than Squeeze, so this
> seems not isolated to just one version of anything in particular.
>
> Let me know if there is any more information which would be of use.
>
> There's quite a few bits of software running on here, RTG, Cricket,
> Nagios, smokeping, rancid
>
> Debian 6.0.3
>
> Linux zzz-zzz 2.6.32-5-686-bigmem #1 SMP Wed Jan 11 13:17:56 UTC 2012
> i686 GNU/Linux
>
> Jan 22 22:40:40 zzz-zzz kernel: [176617.648985] BUG: soft lockup -
> CPU#13 stuck for 61s! [nagios3:2070]
> Jan 22 22:40:40 zzz-zzz kernel: [176617.649040] Modules linked in:
> netconsole configfs joydev usbhid hid xt_multiport iptable_filter
> ip_tables x_tables 8021q garp stp loop snd_pcm snd_timer snd soundcore
> snd_page_alloc ioatdma pcspkr evdev cdc_ether usbnet button processor
> serio_raw dca mii shpchp pci_hotplug i2c_i801 i2c_core ext4 mbcache
> jbd2 crc16 raid10 md_mod sd_mod crc_t10dif ata_generic uhci_hcd
> megaraid_sas ata_piix ehci_hcd libata usbcore scsi_mod nls_base
> thermal bnx2 thermal_sys [last unloaded: netconsole]
> Jan 22 22:40:40 zzz-zzz kernel: [176617.649078]
> Jan 22 22:40:40 zzz-zzz kernel: [176617.649082] Pid: 2070, comm:
> nagios3 Not tainted (2.6.32-5-686-bigmem #1) System x3550 M3
> -[7944D2M]-
> Jan 22 22:40:40 zzz-zzz kernel: [176617.649085] EIP: 0060:[<c10249bb>]
> EFLAGS: 00000202 CPU: 13
> Jan 22 22:40:40 zzz-zzz kernel: [176617.649094] EIP is at
> native_flush_tlb_others+0x85/0xa6
> Jan 22 22:40:40 zzz-zzz kernel: [176617.649096] EAX: 00000282 EBX:
> c14661ac ECX: c10200d8 EDX: 00000020
> Jan 22 22:40:40 zzz-zzz kernel: [176617.649099] ESI: 00000005 EDI:
> 00000140 EBP: c14661a0 ESP: ee4c9a3c
> Jan 22 22:40:40 zzz-zzz kernel: [176617.649101]  DS: 007b ES: 007b FS:
> 00d8 GS: 00e0 SS: 0068
> Jan 22 22:40:40 zzz-zzz kernel: [176617.649104] CR0: 8005003b CR2:
> b758a376 CR3: 2eb7e000 CR4: 000006f0
> Jan 22 22:40:40 zzz-zzz kernel: [176617.649106] DR0: 00000000 DR1:
> 00000000 DR2: 00000000 DR3: 00000000
> Jan 22 22:40:40 zzz-zzz kernel: [176617.649108] DR6: ffff0ff0 DR7: 00000400
> Jan 22 22:40:40 zzz-zzz kernel: [176617.649110] Call Trace:
> Jan 22 22:40:40 zzz-zzz kernel: [176617.649116]  [<c1024aa3>] ?
> flush_tlb_page+0x5d/0x65
> Jan 22 22:40:40 zzz-zzz kernel: [176617.649120]  [<c1023e90>] ?
> ptep_set_access_flags+0x59/0x63
> Jan 22 22:40:40 zzz-zzz kernel: [176617.649125]  [<c10a1040>] ?
> do_wp_page+0x3b9/0x7dd
> Jan 22 22:40:40 zzz-zzz kernel: [176617.649131]  [<c1031770>] ?
> finish_task_switch+0x76/0x95
> Jan 22 22:40:40 zzz-zzz kernel: [176617.649135]  [<c10b61a0>] ?
> kmem_cache_free+0x78/0xaf
> Jan 22 22:40:40 zzz-zzz kernel: [176617.649138]  [<c1031770>] ?
> finish_task_switch+0x76/0x95
> Jan 22 22:40:40 zzz-zzz kernel: [1766Jan 23 07:13:24 zzz-zzz
> syslog-ng[1807]: syslog-ng starting up; version='3.1.3'
>
> Cheers,
> Blair


--
To UNSUBSCRIBE, email to [email protected]
with a subject of "unsubscribe". Trouble? Contact [email protected]
Archive: 
http://lists.debian.org/cahn0gtsh9fcf5te6bsmnqr5u8wfpxkkzx+5osyeykom+sx3...@mail.gmail.com

Reply via email to