On 06/23/15 15:30, Mark Kettenis wrote:
Date: Thu, 18 Jun 2015 23:25:15 +0200 (CEST)
From: Mark Kettenis <[email protected]>
Date: Thu, 18 Jun 2015 22:48:40 +0200
From: Jan Vlach <[email protected]>
psycho0: uncorrectable DMA error AFAR 6656a250 (pa=0 tte=0/49c10012)
AFSR 410000ff40800000
AFAICT this indicates an uncorrectable ECC error during a DMA
transfer. Sadly that suggests your hardware is dying.
Might be worth trying to reseat your memory modules, or swap them out.
Bleah. Looked into the wrong manual. This isn't an ECC error but an
IOMMU error instead. And that almost certainly is a kernel bug of
some sorts. Will try to hunt it down.
What was your last kernel that worked properly?
With sthen@'s help I have tracked down the kernel that does not display
this issue for me its:
OpenBSD 5.6-current (GENERIC) #203: Tue Sep 2 19:32:42 MDT 2014
full dmesg below [1] and following kernel does have a panic:
OpenBSD 5.6-current (GENERIC) #205: Thu Sep 4 10:59:20 MDT 2014
full dmesg for this is below at [2].
The panic happens when I do a tar -xvzf and pkg_add -v in another shell,
the panic is:
panic: psycho0: uncorrectable DMA error AFAR 66742250 (pa=0
tte=0/65b46012) AFSR 410000ff40800000
kdb breakpoint at 15563a4
Stopped at Debugger+0x8: nop
RUN AT LEAST 'trace' AND 'ps' AND INCLUDE OUTPUT WHEN REPORTING THIS PANIC!
DO NOT EVEN BOTHER REPORTING THIS WITHOUT INCLUDING THAT INFORMATION!
ddb> trace
psycho_ue(400007b2900, 40008f7ee00, 0, 4, 40008df7c40, 40008fd0cf0) at
psycho_u
e+0x7c
intr_handler(e0017ec8, 400007b2a00, 3c582, 0, 0, 0) at intr_handler+0xc
sparc_interrupt(0, 3, 40008df5610, 4000f7e9cd8, 1, 0) at
sparc_interrupt+0x298
sys_write(4000906dd50, 4000f7e9db8, 4000f7e9df8, 4000f7e6000,
fffffffffffffff0,
14b) at sys_write+0xb0
syscall(4000f7e9ed0, 404, 25ee90e148, 25ee90e14c, 0, 0) at syscall+0x16c
softtrap(3, 262656ce84, 54, 25ee90c80c, 0, 0) at softtrap+0x19c
ddb> ps
PID PPID PGRP UID S FLAGS WAIT COMMAND
29573 21263 21263 0 3 0x83 pipewr gzip
21263 18483 21263 0 2 0x3 tar
18483 20923 18483 1000 3 0x8b pause ksh
*20923 3390 3390 1000 7 0x10 sshd
3390 27250 3390 0 3 0x92 poll sshd
21865 9404 21865 1000 3 0x83 ttyin ksh
9404 12081 12081 1000 3 0x90 select sshd
12081 27250 12081 0 3 0x92 poll sshd
31553 1 31553 0 3 0x83 ttyin getty
25492 1 25492 0 3 0x80 select cron
23049 1 23049 99 3 0x90 poll sndiod
17520 2596 2596 95 3 0x90 kqread smtpd
30507 2596 2596 95 3 0x90 kqread smtpd
15728 2596 2596 95 3 0x90 kqread smtpd
21278 2596 2596 95 3 0x90 kqread smtpd
20677 2596 2596 95 3 0x90 kqread smtpd
7937 2596 2596 103 3 0x90 kqread smtpd
2596 1 2596 0 3 0x80 kqread smtpd
27250 1 27250 0 3 0x80 select sshd
15422 21838 21838 83 3 0x90 poll ntpd
--db_more--
21838 1 21838 0 3 0x80 poll ntpd
30432 15344 15344 74 3 0x90 bpf pflogd
15344 1 15344 0 3 0x80 netio pflogd
23915 26895 26895 73 3 0x90 poll syslogd
26895 1 26895 0 3 0x80 netio syslogd
27216 1 27216 77 3 0x90 poll dhclient
16326 1 16326 0 3 0x80 poll dhclient
28617 0 0 0 3 0x14200 aiodoned aiodoned
32169 0 0 0 3 0x14200 syncer update
18972 0 0 0 3 0x14200 cleaner cleaner
19331 0 0 0 3 0x14200 reaper reaper
9312 0 0 0 3 0x14200 pgdaemon pagedaemon
27783 0 0 0 3 0x14200 bored crypto
24568 0 0 0 3 0x14200 pftm pfpurge
31452 0 0 0 3 0x14200 usbtsk usbtask
22103 0 0 0 3 0x14200 usbatsk usbatsk
6707 0 0 0 3 0x14200 bored sensors
25160 0 0 0 3 0x14200 bored systqmp
13845 0 0 0 3 0x14200 bored systq
28168 0 0 0 3 0x14200 bored syswq
29666 0 0 0 3 0x40014200 idle0
23885 0 0 0 3 0x14200 kmalloc kmthread
1 0 1 0 3 0x82 wait init
--db_more--
LOM event: +73d+2h26m5s host FAULT: watchdog triggered
0 -1 0 0 3 0x10200 scheduler swapper
If there is anything else I can do to help debug this issue let me know.
Looking through the commit logs for early Sept 2014 - there is nothing
obvious that jumps out at me.
On -current I was getting lots of the following error messages on the
console:
psycho0: correctable DMA error AFAR
I would then get an uncorrectable error and a panic.
Cheers
Fred
[1]
console is /pci@1f,0/isa@7/serial@0,3f8
Copyright (c) 1982, 1986, 1989, 1991, 1993
The Regents of the University of California. All rights reserved.
Copyright (c) 1995-2014 OpenBSD. All rights reserved.
http://www.OpenBSD.org
OpenBSD 5.6-current (GENERIC) #203: Tue Sep 2 19:32:42 MDT 2014
[email protected]:/usr/src/sys/arch/sparc64/compile/GENERIC
real mem = 536870912 (512MB)
avail mem = 512761856 (489MB)
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root: Sun Fire V100 (UltraSPARC-IIe 500MHz)
cpu0 at mainbus0: SUNW,UltraSPARC-IIe (rev 1.4) @ 500 MHz
cpu0: physical 16K instruction (32 b/l), 16K data (32 b/l), 256K
external (64 b/l)
psycho0 at mainbus0: SUNW,sabre, impl 0, version 0, ign 7c0
psycho0: bus range 0-0, PCI bus 0
psycho0: dvma map 60000000-7fffffff
pci0 at psycho0
ebus0 at pci0 dev 7 function 0 "Acer Labs M1533 ISA" rev 0x00
"dma" at ebus0 addr 0-ffff ivec 0x2a not configured
rtc0 at ebus0 addr 70-71: m5819
power0 at ebus0 addr 2000-2007 ivec 0x23
lom0 at ebus0 addr 8010-8011 ivec 0x2a: LOMlite2 rev 3.11
com0 at ebus0 addr 3f8-3ff ivec 0x2b: ns16550a, 16 byte fifo
com0: console
com1 at ebus0 addr 2e8-2ef ivec 0x2b: ns16550a, 16 byte fifo
"flashprom" at ebus0 addr 0-7ffff not configured
alipm0 at pci0 dev 3 function 0 "Acer Labs M7101 Power" rev 0x00: 74KHz
clock
iic0 at alipm0
"max1617" at alipm0 addr 0x18 skipped due to alipm0 bugs
spdmem0 at iic0 addr 0x54: 128MB SDRAM registered ECC PC133CL2
spdmem1 at iic0 addr 0x55: 128MB SDRAM registered ECC PC133CL2
spdmem2 at iic0 addr 0x56: 128MB SDRAM registered ECC PC133CL2
spdmem3 at iic0 addr 0x57: 128MB SDRAM registered ECC PC133CL2
dc0 at pci0 dev 12 function 0 "Davicom DM9102" rev 0x31: ivec 0x7c6,
address 00:03:ba:13:a8:c7
amphy0 at dc0 phy 1: DM9102 10/100 PHY, rev. 0
dc1 at pci0 dev 5 function 0 "Davicom DM9102" rev 0x31: ivec 0x7dc,
address 00:03:ba:13:a8:c8
amphy1 at dc1 phy 1: DM9102 10/100 PHY, rev. 0
ohci0 at pci0 dev 10 function 0 "Acer Labs M5237 USB" rev 0x03: ivec
0x7e4, version 1.0, legacy support
pciide0 at pci0 dev 13 function 0 "Acer Labs M5229 UDMA IDE" rev 0xc3:
DMA, channel 0 configured to native-PCI, channel 1 configured to native-PCI
pciide0: using ivec 0x7cc for native-PCI interrupt
pciide0: channel 0 disabled (no drives)
wd0 at pciide0 channel 1 drive 0: <WDC WD800BB-00DAA3>
wd0: 16-sector PIO, LBA48, 76319MB, 156301488 sectors
atapiscsi0 at pciide0 channel 1 drive 1
scsibus1 at atapiscsi0: 2 targets
cd0 at scsibus1 targ 0 lun 0: <TEAC, CD-224E, 1.7A> ATAPI 5/cdrom removable
wd0(pciide0:1:0): using PIO mode 4, Ultra-DMA mode 2
cd0(pciide0:1:1): using PIO mode 4, Ultra-DMA mode 2
usb0 at ohci0: USB revision 1.0
uhub0 at usb0 "Acer Labs OHCI root hub" rev 1.00/1.00 addr 1
vscsi0 at root
scsibus2 at vscsi0: 256 targets
softraid0 at root
scsibus3 at softraid0: 256 targets
bootpath: /pci@1f,0/ide@d,0/disk@2,0
root on wd0a (22f410dbbff6a15e.a) swap on wd0b dump on wd0b
[2]
console is /pci@1f,0/isa@7/serial@0,3f8
Copyright (c) 1982, 1986, 1989, 1991, 1993
The Regents of the University of California. All rights reserved.
Copyright (c) 1995-2014 OpenBSD. All rights reserved.
http://www.OpenBSD.org
OpenBSD 5.6-current (GENERIC) #205: Thu Sep 4 10:59:20 MDT 2014
[email protected]:/usr/src/sys/arch/sparc64/compile/GENERIC
real mem = 536870912 (512MB)
avail mem = 512761856 (489MB)
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root: Sun Fire V100 (UltraSPARC-IIe 500MHz)
cpu0 at mainbus0: SUNW,UltraSPARC-IIe (rev 1.4) @ 500 MHz
cpu0: physical 16K instruction (32 b/l), 16K data (32 b/l), 256K
external (64 b/l)
psycho0 at mainbus0: SUNW,sabre, impl 0, version 0, ign 7c0
psycho0: bus range 0-0, PCI bus 0
psycho0: dvma map 60000000-7fffffff
pci0 at psycho0
ebus0 at pci0 dev 7 function 0 "Acer Labs M1533 ISA" rev 0x00
"dma" at ebus0 addr 0-ffff ivec 0x2a not configured
rtc0 at ebus0 addr 70-71: m5819
power0 at ebus0 addr 2000-2007 ivec 0x23
lom0 at ebus0 addr 8010-8011 ivec 0x2a: LOMlite2 rev 3.11
com0 at ebus0 addr 3f8-3ff ivec 0x2b: ns16550a, 16 byte fifo
com0: console
com1 at ebus0 addr 2e8-2ef ivec 0x2b: ns16550a, 16 byte fifo
"flashprom" at ebus0 addr 0-7ffff not configured
alipm0 at pci0 dev 3 function 0 "Acer Labs M7101 Power" rev 0x00: 74KHz
clock
iic0 at alipm0
"max1617" at alipm0 addr 0x18 skipped due to alipm0 bugs
spdmem0 at iic0 addr 0x54: 128MB SDRAM registered ECC PC133CL2
spdmem1 at iic0 addr 0x55: 128MB SDRAM registered ECC PC133CL2
spdmem2 at iic0 addr 0x56: 128MB SDRAM registered ECC PC133CL2
spdmem3 at iic0 addr 0x57: 128MB SDRAM registered ECC PC133CL2
dc0 at pci0 dev 12 function 0 "Davicom DM9102" rev 0x31: ivec 0x7c6,
address 00:03:ba:13:a8:c7
amphy0 at dc0 phy 1: DM9102 10/100 PHY, rev. 0
dc1 at pci0 dev 5 function 0 "Davicom DM9102" rev 0x31: ivec 0x7dc,
address 00:03:ba:13:a8:c8
amphy1 at dc1 phy 1: DM9102 10/100 PHY, rev. 0
ohci0 at pci0 dev 10 function 0 "Acer Labs M5237 USB" rev 0x03: ivec
0x7e4, version 1.0, legacy support
pciide0 at pci0 dev 13 function 0 "Acer Labs M5229 UDMA IDE" rev 0xc3:
DMA, channel 0 configured to native-PCI, channel 1 configured to native-PCI
pciide0: using ivec 0x7cc for native-PCI interrupt
pciide0: channel 0 disabled (no drives)
wd0 at pciide0 channel 1 drive 0: <WDC WD800BB-00DAA3>
wd0: 16-sector PIO, LBA48, 76319MB, 156301488 sectors
atapiscsi0 at pciide0 channel 1 drive 1
scsibus1 at atapiscsi0: 2 targets
cd0 at scsibus1 targ 0 lun 0: <TEAC, CD-224E, 1.7A> ATAPI 5/cdrom removable
wd0(pciide0:1:0): using PIO mode 4, Ultra-DMA mode 2
cd0(pciide0:1:1): using PIO mode 4, Ultra-DMA mode 2
usb0 at ohci0: USB revision 1.0
uhub0 at usb0 "Acer Labs OHCI root hub" rev 1.00/1.00 addr 1
vscsi0 at root
scsibus2 at vscsi0: 256 targets
softraid0 at root
scsibus3 at softraid0: 256 targets
bootpath: /pci@1f,0/ide@d,0/disk@2,0
root on wd0a (22f410dbbff6a15e.a) swap on wd0b dump on wd0b