On Sat, Nov 5, 2011 at 8:29 AM, Norman Golisz <[email protected]> wrote:
> Hi Jeffrey,
>
> On Sat Nov 5 2011 07:49, Forman, Jeffrey wrote:
> > I am in the process of building a new OpenBSD i386 5.0-release Intel Atom
> > D510-based fw/router. I was editing some config files on the box in emacs
> > when the process threw a core dump. Thinking perhaps it was just emacs, I
> > went to do something else, 'sudo pkg_add -v mutt', and received a
> coredump
> > again.
> >
> > I went looking for stress testing apps, thinking I might have a bad CPU
> or
> > RAM module and came upon 'stress'. After several iterations of stress
> > seeming to cause kernel panics, and then upgrading to a 5.0 snapshot from
> > November 13, 2011[1], I was still seeing panics. I provide the below
> detail
> > to help those more knowledgeable in debugging.
> >
> > Thanks in advance,
> > Jeff
> >
> > [1] http://openbsd.mirrors.tds.net/pub/OpenBSD/snapshots/i386/
> >
> > Full stress command line:
> > # stress --cpu 8 --io 4 --vm 2 -m 5 --vm-bytes 128M --timeout 30s -v
>
> I did this on my machine as well, it's a i386 single core processor
> running a single processor kernel. I ran this stress test several times,
> no panic. Your panic trace also indicates complications with uvm's page
> fault handler and an MP locking mechanism involved.
>
> Therefore, could you try bsd.sp and do the stress testing again? Is it
> running well now?
>
> Norman.
>
> OpenBSD 5.0-current (GENERIC) #85: Wed Nov 2 22:27:31 MDT 2011
> [email protected]:/usr/src/sys/arch/i386/compile/GENERIC
> cpu0: Intel(R) Pentium(R) M processor 1.70GHz ("GenuineIntel" 686-class)
> 1.70 GHz
> cpu0:
> FPU,V86,DE,PSE,TSC,MSR,MCE,CX8,SEP,MTRR,PGE,MCA,CMOV,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,TM,SBF,EST,TM2
> real mem = 2146299904 (2046MB)
> avail mem = 2101112832 (2003MB)
> mainbus0 at root
> bios0 at mainbus0: AT/286+ BIOS, date 06/18/07, BIOS32 rev. 0 @ 0xfd750,
> SMBIOS rev. 2.33 @ 0xe0010 (61 entries)
> bios0: vendor IBM version "1RETDRWW (3.23 )" date 06/18/2007
> bios0: IBM 2374VDL
> apm0 at bios0: Power Management spec V1.2
> acpi at bios0 function 0x0 not configured
> pcibios0 at bios0: rev 2.1 @ 0xfd6e0/0x920
> pcibios0: PCI IRQ Routing Table rev 1.0 @ 0xfdea0/272 (15 entries)
> pcibios0: PCI Interrupt Router at 000:31:0 ("Intel 82371FB ISA" rev 0x00)
> pcibios0: PCI bus #6 is the last bus
> bios0: ROM list: 0xc0000/0x10000 0xdc000/0x4000! 0xe0000/0x10000!
> cpu0 at mainbus0: (uniprocessor)
> cpu0: Enhanced SpeedStep 1695 MHz: speeds: 1700, 1400, 1200, 1000, 800,
> 600 MHz
> pci0 at mainbus0 bus 0: configuration mode 1 (bios)
> io address conflict 0x5800/0x8
> io address conflict 0x5808/0x4
> io address conflict 0x5810/0x8
> io address conflict 0x580c/0x4
> pchb0 at pci0 dev 0 function 0 "Intel 82855PM Host" rev 0x03
> intelagp0 at pchb0
> agp0 at intelagp0: aperture at 0xd0000000, size 0x10000000
> ppb0 at pci0 dev 1 function 0 "Intel 82855PM AGP" rev 0x03
> pci1 at ppb0 bus 1
> vga1 at pci1 dev 0 function 0 "ATI Radeon Mobility M7" rev 0x00
> wsdisplay0 at vga1 mux 1: console (80x25, vt100 emulation)
> wsdisplay0: screen 1-5 added (80x25, vt100 emulation)
> radeondrm0 at vga1: irq 11
> drm0 at radeondrm0
> uhci0 at pci0 dev 29 function 0 "Intel 82801DB USB" rev 0x01: irq 11
> uhci1 at pci0 dev 29 function 1 "Intel 82801DB USB" rev 0x01: irq 11
> uhci2 at pci0 dev 29 function 2 "Intel 82801DB USB" rev 0x01: irq 11
> ehci0 at pci0 dev 29 function 7 "Intel 82801DB USB" rev 0x01: irq 11
> usb0 at ehci0: USB revision 2.0
> uhub0 at usb0 "Intel EHCI root hub" rev 2.00/1.00 addr 1
> ppb1 at pci0 dev 30 function 0 "Intel 82801BAM Hub-to-PCI" rev 0x81
> pci2 at ppb1 bus 2
> mem address conflict 0xb0000000/0x1000
> mem address conflict 0xb1000000/0x1000
> cbb0 at pci2 dev 0 function 0 "TI PCI4520 CardBus" rev 0x01: irq 11
> cbb1 at pci2 dev 0 function 1 "TI PCI4520 CardBus" rev 0x01: irq 11
> em0 at pci2 dev 1 function 0 "Intel PRO/1000MT (82540EP)" rev 0x03: irq
> 11, address 00:11:25:32:45:72
> iwi0 at pci2 dev 2 function 0 "Intel PRO/Wireless 2200BG" rev 0x05: irq
> 11, address 00:0e:35:bc:03:c1
> cardslot0 at cbb0 slot 0 flags 0
> cardbus0 at cardslot0: bus 3 device 0 cacheline 0x8, lattimer 0xb0
> pcmcia0 at cardslot0
> cardslot1 at cbb1 slot 1 flags 0
> cardbus1 at cardslot1: bus 6 device 0 cacheline 0x8, lattimer 0xb0
> pcmcia1 at cardslot1
> ichpcib0 at pci0 dev 31 function 0 "Intel 82801DBM LPC" rev 0x01: 24-bit
> timer at 3579545Hz
> pciide0 at pci0 dev 31 function 1 "Intel 82801DBM IDE" rev 0x01: DMA,
> channel 0 configured to compatibility, channel 1 configured to compatibility
> wd0 at pciide0 channel 0 drive 0: <SAMSUNG HM160HC>
> wd0: 16-sector PIO, LBA48, 152627MB, 312581808 sectors
> wd0(pciide0:0:0): using PIO mode 4, Ultra-DMA mode 5
> atapiscsi0 at pciide0 channel 1 drive 0
> scsibus0 at atapiscsi0: 2 targets
> cd0 at scsibus0 targ 0 lun 0: <MATSHITA, UJDA745 DVD/CDRW, 1.03> ATAPI
> 5/cdrom removable
> cd0(pciide0:1:0): using PIO mode 4, Ultra-DMA mode 2
> ichiic0 at pci0 dev 31 function 3 "Intel 82801DB SMBus" rev 0x01: irq 11
> iic0 at ichiic0
> spdmem0 at iic0 addr 0x50: 1GB DDR SDRAM non-parity PC2700CL2.5
> spdmem1 at iic0 addr 0x51: 1GB DDR SDRAM non-parity PC2700CL2.5
> auich0 at pci0 dev 31 function 5 "Intel 82801DB AC97" rev 0x01: irq 11,
> ICH4 AC97
> ac97: codec id 0x41445374 (Analog Devices AD1981B)
> ac97: codec features headphone, 20 bit DAC, No 3D Stereo
> audio0 at auich0
> "Intel 82801DB Modem" rev 0x01 at pci0 dev 31 function 6 not configured
> usb1 at uhci0: USB revision 1.0
> uhub1 at usb1 "Intel UHCI root hub" rev 1.00/1.00 addr 1
> usb2 at uhci1: USB revision 1.0
> uhub2 at usb2 "Intel UHCI root hub" rev 1.00/1.00 addr 1
> usb3 at uhci2: USB revision 1.0
> uhub3 at usb3 "Intel UHCI root hub" rev 1.00/1.00 addr 1
> isa0 at ichpcib0
> isadma0 at isa0
> pckbc0 at isa0 port 0x60/5
> pckbd0 at pckbc0 (kbd slot)
> pckbc0: using irq 1 for kbd slot
> wskbd0 at pckbd0: console keyboard, using wsdisplay0
> pms0 at pckbc0 (aux slot)
> pckbc0: using irq 12 for aux slot
> wsmouse0 at pms0 mux 0
> pcppi0 at isa0 port 0x61
> spkr0 at pcppi0
> aps0 at isa0 port 0x1600/31
> npx0 at isa0 port 0xf0/16: reported by CPUID; using exception 16
> mtrr: Pentium Pro MTRR support
> uhidev0 at uhub2 port 1 configuration 1 interface 0 "Logitech USB-PS/2
> Optical Mouse" rev 2.00/18.00 addr 2
> uhidev0: iclass 3/1
> ums0 at uhidev0: 6 buttons, Z dir
> wsmouse1 at ums0 mux 0
> umass0 at uhub2 port 2 configuration 1 interface 0 "TOSHIBA TransMemory"
> rev 2.00/1.00 addr 3
> umass0: using SCSI over Bulk-Only
> scsibus1 at umass0: 2 targets, initiator 0
> sd0 at scsibus1 targ 1 lun 0: <TOSHIBA, TransMemory, 1.00> SCSI2 0/direct
> removable serial.09306544C940942403F1
> sd0: 7643MB, 512 bytes/sector, 15654848 sectors
> ugen0 at uhub3 port 2 "STMicroelectronics Biometric Coprocessor" rev
> 1.00/0.01 addr 2
> vscsi0 at root
> scsibus2 at vscsi0: 256 targets
> softraid0 at root
> scsibus3 at softraid0: 256 targets
> softraid0: sd1 was not shutdown properly
> sd1 at scsibus3 targ 1 lun 0: <OPENBSD, SR CRYPTO, 005> SCSI2 0/direct
> fixed
> sd1: 61446MB, 512 bytes/sector, 125843232 sectors
> root device (default wd0a): sd1a
> swap device (default sd1b): wd0b
> root on sd1a swap on wd0b dump on wd0b
>
>
Norman had a good thought, one which I had not thought of since OpenBSD
pre-selected booting from the MP kernel during the install given the CPU I
am running. I booted up the box on the SP kernel, and was able to get
through a few 30 second stress runs, but then ran one set for 90 seconds
and received another panic:
pmap_page_remove: pg=0xd2342ce0: va=8f9d4000, pv_ptp=0xd2956da4
pmap_page_remove: PTP's phys addr: actual=0, recorded=71112000
panic: pmap_page_remove: mapped managed page has invalid pv_ptp field
Stopped at Debugger+0x4: popl %ebp
RUN AT LEAST 'trace' AND 'ps' AND INCLUDE OUTPUT WHEN REPORTING THIS PANIC!
DO NOT EVEN BOTHER REPORTING THIS WITHOUT INCLUDING THAT INFORMATION!
ddb> ps
PID PPID PGRP UID S FLAGS WAIT COMMAND
17010 22606 22606 0 2 0 stress
9052 22606 22606 0 2 0 stress
18279 22606 22606 0 2 0 stress
5192 22606 22606 0 2 0 stress
22557 22606 22606 0 2 0 stress
31262 22606 22606 0 2 0 stress
29473 22606 22606 0 2 0 stress
12947 22606 22606 0 2 0 stress
21017 22606 22606 0 2 0 stress
24320 22606 22606 0 2 0 stress
26998 22606 22606 0 2 0 stress
23032 22606 22606 0 2 0 stress
19909 22606 22606 0 2 0 stress
23706 22606 22606 0 2 0 stress
* 4608 22606 22606 0 7 0 stress
4070 22606 22606 0 2 0 stress
10494 22606 22606 0 2 0 stress
22606 14553 22606 0 3 0x80 wait stress
14553 25593 14553 0 3 0x88 pause ksh
25593 10983 25593 1000 3 0x88 pause ksh
10983 12086 12086 1000 3 0x80 select sshd
12086 4870 12086 0 3 0x80 poll sshd
8902 1 8902 0 3 0x80 ttyin getty
17445 1 17445 0 3 0x80 ttyin getty
294 1 294 0 3 0x80 ttyin getty
10058 1 10058 0 3 0x80 ttyin getty
8789 1 8789 0 3 0x80 ttyin getty
11270 1 11270 0 3 0x80 ttyin getty
3408 1 3408 0 3 0x80 select cron
17208 1 17208 0 3 0x80 select inetd
1018 11457 11457 95 3 0x80 kqread smtpd
1605 11457 11457 95 3 0x80 kqread smtpd
20814 11457 11457 95 3 0x80 kqread smtpd
32539 11457 11457 95 3 0x80 kqread smtpd
3932 11457 11457 95 3 0x80 kqread smtpd
30194 11457 11457 95 3 0x80 kqread smtpd
9971 11457 11457 95 3 0x80 kqread smtpd
17016 11457 11457 95 3 0x80 kqread smtpd
11457 1 11457 0 3 0x80 kqread smtpd
4870 1 4870 0 3 0x80 select sshd
8865 32197 22908 83 3 0x80 poll ntpd
32197 22908 22908 83 3 0x80 poll ntpd
22908 1 22908 0 3 0x80 poll ntpd
6070 12921 12921 74 3 0x80 bpf pflogd
12921 1 12921 0 3 0x80 netio pflogd
138 10686 10686 73 2 0x80 syslogd
10686 1 10686 0 3 0x80 netio syslogd
15 0 0 0 3 0x100200 aiodoned aiodoned
14 0 0 0 3 0x100200 syncer update
13 0 0 0 3 0x100200 cleaner cleaner
12 0 0 0 3 0x100200 reaper reaper
11 0 0 0 3 0x100200 pgdaemon pagedaemon
10 0 0 0 3 0x100200 bored crypto
9 0 0 0 3 0x100200 pftm pfpurge
8 0 0 0 3 0x100200 usbtsk usbtask
7 0 0 0 3 0x100200 usbatsk usbatsk
6 0 0 0 3 0x100200 bored intelrel
5 0 0 0 3 0x100200 acpi0 acpi0
4 0 0 0 3 0x100200 bored syswq
3 0 0 0 3 0x40100200 idle0
2 0 0 0 3 0x100200 kmalloc kmthread
1 0 1 0 3 0x80 wait init
0 -1 0 0 3 0x200 scheduler swapper
ddb> trace
Debugger(d08d3b58,de8a9e08,d08d7008,de8a9e08,d09b9b94) at Debugger+0x4
panic(d08d7008,0,71112000,d2956da4,d8af872c) at panic+0x5d
pmap_page_remove(d2342ce0,d8af872c,d0a23360,d0a48c20,d89d7be0) at
pmap_page_rem
ove+0x1d2
uvm_anfree(d8af8738,d8910028,0,d8915f40,d8915f40) at uvm_anfree+0xac
amap_wipeout(d89d7be0,0,10,0,d8c4c9dc) at amap_wipeout+0x94
uvm_map_unreference_amap(d8915f40,0,de8a9ecc,d03d8591,d8c4c9e0) at
uvm_map_unre
ference_amap+0x2f
uvm_unmap_detach(d8ac8e50,0,92823000,de8a9f0c,d8c292c0) at
uvm_unmap_detach+0x5
b
sys_munmap(d8c292c0,de8a9f64,de8a9f84,de8a9fa8,d8c292c0) at sys_munmap+0x15d
syscall() at syscall+0x2d8
--- syscall (number 134217728) ---
0x2:
Am I barking up the wrong tree trying to deduce if I really do have a
hardware problem? I am open to accepting diffs and compiling from source if
other developers think there might be a bug to fix here.
Cheers,
Jeff