On Fri, 23 Mar 2012 21:38:20 -0500, Jonathan Nieder <jrnie...@gmail.com> wrote:
> Daniel Kahn Gillmor wrote:
> 
> > I'm about to try to reboot it again to see if i can get it back to
> > stability under the lenny hypervisor and kernel, but i'll need to do
> > that with the rescue 2.6.32-5-486 image as well, so it's possible that
> > i'll have another backtrace or crash to follow up with in a little bit.
> 
> Ok, thanks again.  I'd suggest blacklisting the i915 module to rule it
> out as a cause.

Rebooted the machine again to 2.6.32-5-486 with i915 blacklisted, and
got this crash during boot (even before the handoff to init):

Begin: Running /scripts/local-bottom ... done.
done.
Begin: Running /scripts/init-bottom ... [    6.202557] BUG: unable to handle 
kernel paging request at 04040f7c
[    6.204010] IP: [<c107dc0c>] pmd_none_or_clear_bad+0x0/0x27
[    6.204010] *pde = 00000000 
[    6.204010] Oops: 0000 [#1] 
[    6.204010] last sysfs file: /sys/power/resume
[    6.204010] Modules linked in: ext3 jbd mbcache dm_mod raid1 md_mod sd_mod 
crc_t10dif ata_generic uhci_hcd tg3 thermal ata_piix libphy 3c59x mii tulip 
ehci_hcd thermal_sys libata scsi_mod usbcore nls_base [last unloaded: 
scsi_wait_scan]
[    6.204010] 
[    6.204010] Pid: 44, comm: udevd Not tainted (2.6.32-5-486 #1) HP d530 
SFF(DG784A)
[    6.204010] EIP: 0060:[<c107dc0c>] EFLAGS: 00010206 CPU: 0
[    6.204010] EIP is at pmd_none_or_clear_bad+0x0/0x27
[    6.204010] EAX: 04040f7c EBX: b7c00000 ECX: 04040404 EDX: 04040f7c
[    6.204010] ESI: c321bce0 EDI: b7879000 EBP: f72ca8f0 ESP: f72e7eb4
[    6.204010]  DS: 007b ES: 007b FS: 0000 GS: 00e0 SS: 0068
[    6.204010] Process udevd (pid: 44, ti=f72e6000 task=f72f0820 
task.ti=f72e6000)
[    6.204010] Stack:
[    6.204010]  c107efe7 c1018d00 fffb2000 ec06b067 f72f0820 b7879fff edfe7065 
c31d4ec0
[    6.204010] <0> 00000000 f72e7f44 00000001 0039cede 00000000 f72cec00 
b787a000 04040f7c
[    6.204010] <0> 04040f7c 00000004 c1073d95 ffffffff 00000000 fffb21e4 
f72cec00 c1338508
[    6.204010] Call Trace:
[    6.204010]  [<c107efe7>] ? unmap_vmas+0x1ba/0x5b4
[    6.204010]  [<c1018d00>] ? kmap_atomic_prot+0xbd/0xe0
[    6.204010]  [<c1073d95>] ? ____pagevec_lru_add+0xf4/0x102
[    6.204010]  [<c108293e>] ? exit_mmap+0x90/0xf9
[    6.204010]  [<c1025ddd>] ? jiffies_to_timeval+0x1c/0x33
[    6.204010]  [<c1020c9d>] ? mmput+0x32/0x92
[    6.204010]  [<c1023b4a>] ? exit_mm+0xaa/0xb1
[    6.204010]  [<c104ac93>] ? acct_collect+0x5b/0x109
[    6.204010]  [<c1025146>] ? do_exit+0x184/0x579
[    6.204010]  [<c1025589>] ? do_group_exit+0x4e/0x71
[    6.204010]  [<c10255bd>] ? sys_exit_group+0x11/0x14
[    6.204010]  [<c100312c>] ? syscall_call+0x7/0xb
[    6.204010] Code: 2d c1 68 98 05 2d c1 e8 84 71 1c 00 31 d2 89 d0 8d b6 00 
00 00 00 89 44 24 14 89 d8 8b 54 24 14 e8 79 66 f9 ff 90 83 c4 18 5b c3 <8b> 10 
89 c1 b8 01 00 00 00 85 d2 74 19 81 e2 fb 0f 00 00 31 c0 
[    6.204010] EIP: [<c107dc0c>] pmd_none_or_clear_bad+0x0/0x27 SS:ESP 
0068:f72e7eb4
[    6.204010] CR2: 0000000004040f7c
[    6.406606] ---[ end trace e36674c63db8ef72 ]---
[    6.411216] Fixing recursive fault but reboot is needed!
done.
INIT: version 2.88 booting
Starting the hotplug events dispatcher: udevd[    8.961493] udev[370]: starting 
version 164


Any ideas?  I think this safely rules out i915 as the cause.

Also, i need to retract my claim that it was running the lenny kernel
for years until just recently, now that i've inspected the logs from the
machine more closely.

It looks like it was running as a lenny xen system up until Feb. 17,
2011, at which point it switched to running a squeeze stack based on
xen-hypervisor-4.0-i386 (v. 4.0-2) and linux-image-2.6.32-5-xen-686
(v. 2.6.32-30).

This combination ran on this hardware for over a year (i know, i know)
without trouble, and crashed on March 9th with this message:

Mar  9 08:04:17 monkey kernel: [33331163.489222] BUG: unable to handle kernel 
paging request at 04247c8b
Mar  9 08:04:17 monkey kernel: [33331163.489252] IP: [<04247c8b>] 0x4247c8b
Mar  9 08:04:17 monkey kernel: [33331163.489273] *pdpt = 0000000001469001 *pde 
= 0000000000000000 
Mar  9 08:04:17 monkey kernel: [33331163.489293] Oops: 0000 [#1] SMP 
Mar  9 08:04:17 monkey kernel: [33331163.489310] last sysfs file: 
/sys/devices/virtual/block/md1/md/mismatch_cnt
Mar  9 08:04:17 monkey kernel: [33331163.489324] Modules linked in: xt_state 
iptable_mangle xt_physdev iptable_filter ipt_MASQUERADE ipt_REDIRECT xt_tcpudp 
iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 ip_tables 
x_tables bridge stp xen_evtchn xenfs dummy loop snd_intel8x0 snd_ac97_codec 
i915 ac97_bus drm_kms_helper snd_pcm snd_timer i2c_i801 drm snd pl2303 
i2c_algo_bit soundcore evdev parport_pc pcspkr psmouse processor usbserial 
parport video acpi_processor serio_raw shpchp rng_core snd_page_alloc i2c_core 
output pci_hotplug button ext3 jbd mbcache dm_mod raid1 md_mod sd_mod 
crc_t10dif ata_generic tg3 3c59x ata_piix floppy tulip uhci_hcd mii libphy 
libata ehci_hcd scsi_mod usbcore nls_base thermal thermal_sys [last unloaded: 
scsi_wait_scan]
Mar  9 08:04:17 monkey kernel: [33331163.489673] 
Mar  9 08:04:17 monkey kernel: [33331163.489684] Pid: 5807, comm: sshd Not 
tainted (2.6.32-5-xen-686 #1) HP d530 SFF(DG784A)
Mar  9 08:04:17 monkey kernel: [33331163.489697] EIP: 0061:[<04247c8b>] EFLAGS: 
00210206 CPU: 0
Mar  9 08:04:17 monkey kernel: [33331163.489711] EIP is at 0x4247c8b
Mar  9 08:04:17 monkey kernel: [33331163.489721] EAX: cf28f240 EBX: cf28f240 
ECX: cf28fea0 EDX: 04247c8b
Mar  9 08:04:17 monkey kernel: [33331163.489732] ESI: cf28f5a0 EDI: cf28f240 
EBP: cea53434 ESP: cdc55f24
Mar  9 08:04:17 monkey kernel: [33331163.489744]  DS: 007b ES: 007b FS: 00d8 
GS: 00e0 SS: 0069
Mar  9 08:04:17 monkey kernel: [33331163.489757] Process sshd (pid: 5807, 
ti=cdc54000 task=cef83740 task.ti=cdc54000)
Mar  9 08:04:17 monkey kernel: [33331163.489769] Stack:
Mar  9 08:04:17 monkey kernel: [33331163.489776]  c10a66a4 c2f93b58 cea53400 
c10a67d0 00000000 000000ba c2f93b58 cea53400
Mar  9 08:04:17 monkey kernel: [33331163.489814] <0> cef83740 cea53400 c103573a 
00000000 c1038b42 c2a2a780 c128eaf0 c2a2a780
Mar  9 08:04:17 monkey kernel: [33331163.489855] <0> cef83740 0000ff00 0000ff00 
c103a2eb 00000001 00000004 cdc55fb4 b7868810
Mar  9 08:04:17 monkey kernel: [33331163.489900] Call Trace:
Mar  9 08:04:17 monkey kernel: [33331163.489917]  [<c10a66a4>] ? 
remove_vma+0x1e/0x48
Mar  9 08:04:17 monkey kernel: [33331163.489932]  [<c10a67d0>] ? 
exit_mmap+0x102/0x119
Mar  9 08:04:17 monkey kernel: [33331163.489948]  [<c103573a>] ? mmput+0x37/0xa9
Mar  9 08:04:17 monkey kernel: [33331163.489963]  [<c1038b42>] ? 
exit_mm+0xd5/0xdc
Mar  9 08:04:17 monkey kernel: [33331163.489978]  [<c128eaf0>] ? 
_spin_lock_irq+0xb/0x21
Mar  9 08:04:17 monkey kernel: [33331163.489992]  [<c103a2eb>] ? 
do_exit+0x1a3/0x5cd
Mar  9 08:04:17 monkey kernel: [33331163.490007]  [<c1290870>] ? 
do_page_fault+0x2f1/0x307
Mar  9 08:04:17 monkey kernel: [33331163.490021]  [<c103a774>] ? 
do_group_exit+0x5f/0x82
Mar  9 08:04:17 monkey kernel: [33331163.490034]  [<c103a7a8>] ? 
sys_exit_group+0x11/0x14
Mar  9 08:04:17 monkey kernel: [33331163.490049]  [<c1008f9c>] ? 
syscall_call+0x7/0xb
Mar  9 08:04:17 monkey kernel: [33331163.490059] Code:  Bad EIP value.
Mar  9 08:04:17 monkey kernel: [33331163.490075] EIP: [<04247c8b>] 0x4247c8b 
SS:ESP 0069:cdc55f24
Mar  9 08:04:17 monkey kernel: [33331163.490096] CR2: 0000000004247c8b
Mar  9 08:04:17 monkey kernel: [33331163.490439] ---[ end trace 
131f0fb1a676a803 ]---
Mar  9 08:04:17 monkey kernel: [33331163.490450] Fixing recursive fault but 
reboot is needed!


Since that crash, i haven't been able to get the system running stably
(hence these bug reports).

Any thoughts or suggestions of what i should be looking at next?  Do i
need to replace the hardware?

     --dkg



-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/87obrkfr27....@pip.fifthhorseman.net

Reply via email to