On 06/23/15 15:30, Mark Kettenis wrote:
Date: Thu, 18 Jun 2015 23:25:15 +0200 (CEST)
From: Mark Kettenis <[email protected]>

Date: Thu, 18 Jun 2015 22:48:40 +0200
From: Jan Vlach <[email protected]>

psycho0: uncorrectable DMA error AFAR 6656a250 (pa=0 tte=0/49c10012)
AFSR 410000ff40800000

AFAICT this indicates an uncorrectable ECC error during a DMA
transfer.  Sadly that suggests your hardware is dying.

Might be worth trying to reseat your memory modules, or swap them out.

Bleah.  Looked into the wrong manual.  This isn't an ECC error but an
IOMMU error instead.  And that almost certainly is a kernel bug of
some sorts.  Will try to hunt it down.

What was your last kernel that worked properly?


With sthen@'s help I have tracked down the kernel that does not display this issue for me its:

OpenBSD 5.6-current (GENERIC) #203: Tue Sep  2 19:32:42 MDT 2014

full dmesg below [1] and following kernel does have a panic:

OpenBSD 5.6-current (GENERIC) #205: Thu Sep  4 10:59:20 MDT 2014

full dmesg for this is below at [2].

The panic happens when I do a tar -xvzf and pkg_add -v in another shell, the panic is:

panic: psycho0: uncorrectable DMA error AFAR 66742250 (pa=0 tte=0/65b46012) AFSR 410000ff40800000
kdb breakpoint at 15563a4
Stopped at      Debugger+0x8:   nop
RUN AT LEAST 'trace' AND 'ps' AND INCLUDE OUTPUT WHEN REPORTING THIS PANIC!
DO NOT EVEN BOTHER REPORTING THIS WITHOUT INCLUDING THAT INFORMATION!
ddb> trace
psycho_ue(400007b2900, 40008f7ee00, 0, 4, 40008df7c40, 40008fd0cf0) at psycho_u e+0x7c
intr_handler(e0017ec8, 400007b2a00, 3c582, 0, 0, 0) at intr_handler+0xc
sparc_interrupt(0, 3, 40008df5610, 4000f7e9cd8, 1, 0) at sparc_interrupt+0x298 sys_write(4000906dd50, 4000f7e9db8, 4000f7e9df8, 4000f7e6000, fffffffffffffff0, 14b) at sys_write+0xb0
syscall(4000f7e9ed0, 404, 25ee90e148, 25ee90e14c, 0, 0) at syscall+0x16c
softtrap(3, 262656ce84, 54, 25ee90c80c, 0, 0) at softtrap+0x19c
ddb> ps
   PID   PPID   PGRP    UID  S       FLAGS  WAIT          COMMAND
 29573  21263  21263      0  3        0x83  pipewr        gzip
 21263  18483  21263      0  2         0x3                tar
 18483  20923  18483   1000  3        0x8b  pause         ksh
*20923   3390   3390   1000  7        0x10                sshd
  3390  27250   3390      0  3        0x92  poll          sshd
 21865   9404  21865   1000  3        0x83  ttyin         ksh
  9404  12081  12081   1000  3        0x90  select        sshd
 12081  27250  12081      0  3        0x92  poll          sshd
 31553      1  31553      0  3        0x83  ttyin         getty
 25492      1  25492      0  3        0x80  select        cron
 23049      1  23049     99  3        0x90  poll          sndiod
 17520   2596   2596     95  3        0x90  kqread        smtpd
 30507   2596   2596     95  3        0x90  kqread        smtpd
 15728   2596   2596     95  3        0x90  kqread        smtpd
 21278   2596   2596     95  3        0x90  kqread        smtpd
 20677   2596   2596     95  3        0x90  kqread        smtpd
  7937   2596   2596    103  3        0x90  kqread        smtpd
  2596      1   2596      0  3        0x80  kqread        smtpd
 27250      1  27250      0  3        0x80  select        sshd
 15422  21838  21838     83  3        0x90  poll          ntpd
--db_more--
 21838      1  21838      0  3        0x80  poll          ntpd
 30432  15344  15344     74  3        0x90  bpf           pflogd
 15344      1  15344      0  3        0x80  netio         pflogd
 23915  26895  26895     73  3        0x90  poll          syslogd
 26895      1  26895      0  3        0x80  netio         syslogd
 27216      1  27216     77  3        0x90  poll          dhclient
 16326      1  16326      0  3        0x80  poll          dhclient
 28617      0      0      0  3     0x14200  aiodoned      aiodoned
 32169      0      0      0  3     0x14200  syncer        update
 18972      0      0      0  3     0x14200  cleaner       cleaner
 19331      0      0      0  3     0x14200  reaper        reaper
  9312      0      0      0  3     0x14200  pgdaemon      pagedaemon
 27783      0      0      0  3     0x14200  bored         crypto
 24568      0      0      0  3     0x14200  pftm          pfpurge
 31452      0      0      0  3     0x14200  usbtsk        usbtask
 22103      0      0      0  3     0x14200  usbatsk       usbatsk
  6707      0      0      0  3     0x14200  bored         sensors
 25160      0      0      0  3     0x14200  bored         systqmp
 13845      0      0      0  3     0x14200  bored         systq
 28168      0      0      0  3     0x14200  bored         syswq
 29666      0      0      0  3  0x40014200                idle0
 23885      0      0      0  3     0x14200  kmalloc       kmthread
     1      0      1      0  3        0x82  wait          init
--db_more--
LOM event: +73d+2h26m5s host FAULT: watchdog triggered
     0     -1      0      0  3     0x10200  scheduler     swapper

If there is anything else I can do to help debug this issue let me know.

Looking through the commit logs for early Sept 2014 - there is nothing obvious that jumps out at me.

On -current I was getting lots of the following error messages on the console:

psycho0: correctable DMA error AFAR

I would then get an uncorrectable error and a panic.

Cheers

Fred

[1]
console is /pci@1f,0/isa@7/serial@0,3f8
Copyright (c) 1982, 1986, 1989, 1991, 1993
        The Regents of the University of California.  All rights reserved.
Copyright (c) 1995-2014 OpenBSD. All rights reserved. http://www.OpenBSD.org

OpenBSD 5.6-current (GENERIC) #203: Tue Sep  2 19:32:42 MDT 2014
    [email protected]:/usr/src/sys/arch/sparc64/compile/GENERIC
real mem = 536870912 (512MB)
avail mem = 512761856 (489MB)
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root: Sun Fire V100 (UltraSPARC-IIe 500MHz)
cpu0 at mainbus0: SUNW,UltraSPARC-IIe (rev 1.4) @ 500 MHz
cpu0: physical 16K instruction (32 b/l), 16K data (32 b/l), 256K external (64 b/l)
psycho0 at mainbus0: SUNW,sabre, impl 0, version 0, ign 7c0
psycho0: bus range 0-0, PCI bus 0
psycho0: dvma map 60000000-7fffffff
pci0 at psycho0
ebus0 at pci0 dev 7 function 0 "Acer Labs M1533 ISA" rev 0x00
"dma" at ebus0 addr 0-ffff ivec 0x2a not configured
rtc0 at ebus0 addr 70-71: m5819
power0 at ebus0 addr 2000-2007 ivec 0x23
lom0 at ebus0 addr 8010-8011 ivec 0x2a: LOMlite2 rev 3.11
com0 at ebus0 addr 3f8-3ff ivec 0x2b: ns16550a, 16 byte fifo
com0: console
com1 at ebus0 addr 2e8-2ef ivec 0x2b: ns16550a, 16 byte fifo
"flashprom" at ebus0 addr 0-7ffff not configured
alipm0 at pci0 dev 3 function 0 "Acer Labs M7101 Power" rev 0x00: 74KHz clock
iic0 at alipm0
"max1617" at alipm0 addr 0x18 skipped due to alipm0 bugs
spdmem0 at iic0 addr 0x54: 128MB SDRAM registered ECC PC133CL2
spdmem1 at iic0 addr 0x55: 128MB SDRAM registered ECC PC133CL2
spdmem2 at iic0 addr 0x56: 128MB SDRAM registered ECC PC133CL2
spdmem3 at iic0 addr 0x57: 128MB SDRAM registered ECC PC133CL2
dc0 at pci0 dev 12 function 0 "Davicom DM9102" rev 0x31: ivec 0x7c6, address 00:03:ba:13:a8:c7
amphy0 at dc0 phy 1: DM9102 10/100 PHY, rev. 0
dc1 at pci0 dev 5 function 0 "Davicom DM9102" rev 0x31: ivec 0x7dc, address 00:03:ba:13:a8:c8
amphy1 at dc1 phy 1: DM9102 10/100 PHY, rev. 0
ohci0 at pci0 dev 10 function 0 "Acer Labs M5237 USB" rev 0x03: ivec 0x7e4, version 1.0, legacy support pciide0 at pci0 dev 13 function 0 "Acer Labs M5229 UDMA IDE" rev 0xc3: DMA, channel 0 configured to native-PCI, channel 1 configured to native-PCI
pciide0: using ivec 0x7cc for native-PCI interrupt
pciide0: channel 0 disabled (no drives)
wd0 at pciide0 channel 1 drive 0: <WDC WD800BB-00DAA3>
wd0: 16-sector PIO, LBA48, 76319MB, 156301488 sectors
atapiscsi0 at pciide0 channel 1 drive 1
scsibus1 at atapiscsi0: 2 targets
cd0 at scsibus1 targ 0 lun 0: <TEAC, CD-224E, 1.7A> ATAPI 5/cdrom removable
wd0(pciide0:1:0): using PIO mode 4, Ultra-DMA mode 2
cd0(pciide0:1:1): using PIO mode 4, Ultra-DMA mode 2
usb0 at ohci0: USB revision 1.0
uhub0 at usb0 "Acer Labs OHCI root hub" rev 1.00/1.00 addr 1
vscsi0 at root
scsibus2 at vscsi0: 256 targets
softraid0 at root
scsibus3 at softraid0: 256 targets
bootpath: /pci@1f,0/ide@d,0/disk@2,0
root on wd0a (22f410dbbff6a15e.a) swap on wd0b dump on wd0b

[2]
console is /pci@1f,0/isa@7/serial@0,3f8
Copyright (c) 1982, 1986, 1989, 1991, 1993
        The Regents of the University of California.  All rights reserved.
Copyright (c) 1995-2014 OpenBSD. All rights reserved. http://www.OpenBSD.org

OpenBSD 5.6-current (GENERIC) #205: Thu Sep  4 10:59:20 MDT 2014
    [email protected]:/usr/src/sys/arch/sparc64/compile/GENERIC
real mem = 536870912 (512MB)
avail mem = 512761856 (489MB)
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root: Sun Fire V100 (UltraSPARC-IIe 500MHz)
cpu0 at mainbus0: SUNW,UltraSPARC-IIe (rev 1.4) @ 500 MHz
cpu0: physical 16K instruction (32 b/l), 16K data (32 b/l), 256K external (64 b/l)
psycho0 at mainbus0: SUNW,sabre, impl 0, version 0, ign 7c0
psycho0: bus range 0-0, PCI bus 0
psycho0: dvma map 60000000-7fffffff
pci0 at psycho0
ebus0 at pci0 dev 7 function 0 "Acer Labs M1533 ISA" rev 0x00
"dma" at ebus0 addr 0-ffff ivec 0x2a not configured
rtc0 at ebus0 addr 70-71: m5819
power0 at ebus0 addr 2000-2007 ivec 0x23
lom0 at ebus0 addr 8010-8011 ivec 0x2a: LOMlite2 rev 3.11
com0 at ebus0 addr 3f8-3ff ivec 0x2b: ns16550a, 16 byte fifo
com0: console
com1 at ebus0 addr 2e8-2ef ivec 0x2b: ns16550a, 16 byte fifo
"flashprom" at ebus0 addr 0-7ffff not configured
alipm0 at pci0 dev 3 function 0 "Acer Labs M7101 Power" rev 0x00: 74KHz clock
iic0 at alipm0
"max1617" at alipm0 addr 0x18 skipped due to alipm0 bugs
spdmem0 at iic0 addr 0x54: 128MB SDRAM registered ECC PC133CL2
spdmem1 at iic0 addr 0x55: 128MB SDRAM registered ECC PC133CL2
spdmem2 at iic0 addr 0x56: 128MB SDRAM registered ECC PC133CL2
spdmem3 at iic0 addr 0x57: 128MB SDRAM registered ECC PC133CL2
dc0 at pci0 dev 12 function 0 "Davicom DM9102" rev 0x31: ivec 0x7c6, address 00:03:ba:13:a8:c7
amphy0 at dc0 phy 1: DM9102 10/100 PHY, rev. 0
dc1 at pci0 dev 5 function 0 "Davicom DM9102" rev 0x31: ivec 0x7dc, address 00:03:ba:13:a8:c8
amphy1 at dc1 phy 1: DM9102 10/100 PHY, rev. 0
ohci0 at pci0 dev 10 function 0 "Acer Labs M5237 USB" rev 0x03: ivec 0x7e4, version 1.0, legacy support pciide0 at pci0 dev 13 function 0 "Acer Labs M5229 UDMA IDE" rev 0xc3: DMA, channel 0 configured to native-PCI, channel 1 configured to native-PCI
pciide0: using ivec 0x7cc for native-PCI interrupt
pciide0: channel 0 disabled (no drives)
wd0 at pciide0 channel 1 drive 0: <WDC WD800BB-00DAA3>
wd0: 16-sector PIO, LBA48, 76319MB, 156301488 sectors
atapiscsi0 at pciide0 channel 1 drive 1
scsibus1 at atapiscsi0: 2 targets
cd0 at scsibus1 targ 0 lun 0: <TEAC, CD-224E, 1.7A> ATAPI 5/cdrom removable
wd0(pciide0:1:0): using PIO mode 4, Ultra-DMA mode 2
cd0(pciide0:1:1): using PIO mode 4, Ultra-DMA mode 2
usb0 at ohci0: USB revision 1.0
uhub0 at usb0 "Acer Labs OHCI root hub" rev 1.00/1.00 addr 1
vscsi0 at root
scsibus2 at vscsi0: 256 targets
softraid0 at root
scsibus3 at softraid0: 256 targets
bootpath: /pci@1f,0/ide@d,0/disk@2,0
root on wd0a (22f410dbbff6a15e.a) swap on wd0b dump on wd0b

Reply via email to