On 2015/09/04 18:02, Fred wrote:
> On 06/23/15 15:30, Mark Kettenis wrote:
> >>Date: Thu, 18 Jun 2015 23:25:15 +0200 (CEST)
> >>From: Mark Kettenis <[email protected]>
> >>
> >>>Date: Thu, 18 Jun 2015 22:48:40 +0200
> >>>From: Jan Vlach <[email protected]>
> >>
> >>>psycho0: uncorrectable DMA error AFAR 6656a250 (pa=0 tte=0/49c10012)
> >>>AFSR 410000ff40800000
> >>
> >>AFAICT this indicates an uncorrectable ECC error during a DMA
> >>transfer.  Sadly that suggests your hardware is dying.
> >>
> >>Might be worth trying to reseat your memory modules, or swap them out.
> >
> >Bleah.  Looked into the wrong manual.  This isn't an ECC error but an
> >IOMMU error instead.  And that almost certainly is a kernel bug of
> >some sorts.  Will try to hunt it down.
> >
> >What was your last kernel that worked properly?
> >
> 
> With sthen@'s help I have tracked down the kernel that does not display this
> issue for me its:
> 
> OpenBSD 5.6-current (GENERIC) #203: Tue Sep  2 19:32:42 MDT 2014
> 
> full dmesg below [1] and following kernel does have a panic:
> 
> OpenBSD 5.6-current (GENERIC) #205: Thu Sep  4 10:59:20 MDT 2014

could it be this?

CVSROOT:        /cvs
Module name:    src
Changes by:     [email protected]     2014/09/03 18:36:00

Modified files:
        sys/sys        : pool.h
        sys/kern       : subr_pool.c

Log message:
rework how pools with large pages (>PAGE_SIZE) are implemented.

this moves the size of the pool page (not arch page) out of the
pool allocator into struct pool. this lets us create only two pools
for the automatically determined large page allocations instead of
256 of them.

while here support using slack space in large pages for the
pool_item_header by requiring km_alloc provide pool page aligned
memory.

lastly, instead of doing incorrect math to figure how how many arch
pages to use for large pool pages, just use powers of two.

ok mikeb@



> full dmesg for this is below at [2].
> 
> The panic happens when I do a tar -xvzf and pkg_add -v in another shell, the
> panic is:
> 
> panic: psycho0: uncorrectable DMA error AFAR 66742250 (pa=0 tte=0/65b46012)
> AFSR 410000ff40800000
> kdb breakpoint at 15563a4
> Stopped at      Debugger+0x8:   nop
> RUN AT LEAST 'trace' AND 'ps' AND INCLUDE OUTPUT WHEN REPORTING THIS PANIC!
> DO NOT EVEN BOTHER REPORTING THIS WITHOUT INCLUDING THAT INFORMATION!
> ddb> trace
> psycho_ue(400007b2900, 40008f7ee00, 0, 4, 40008df7c40, 40008fd0cf0) at
> psycho_u
> e+0x7c
> 
> intr_handler(e0017ec8, 400007b2a00, 3c582, 0, 0, 0) at intr_handler+0xc
> sparc_interrupt(0, 3, 40008df5610, 4000f7e9cd8, 1, 0) at
> sparc_interrupt+0x298
> sys_write(4000906dd50, 4000f7e9db8, 4000f7e9df8, 4000f7e6000,
> fffffffffffffff0,
>  14b) at sys_write+0xb0
> 
> syscall(4000f7e9ed0, 404, 25ee90e148, 25ee90e14c, 0, 0) at syscall+0x16c
> softtrap(3, 262656ce84, 54, 25ee90c80c, 0, 0) at softtrap+0x19c
> ddb> ps
>    PID   PPID   PGRP    UID  S       FLAGS  WAIT          COMMAND
>  29573  21263  21263      0  3        0x83  pipewr        gzip
>  21263  18483  21263      0  2         0x3                tar
>  18483  20923  18483   1000  3        0x8b  pause         ksh
> *20923   3390   3390   1000  7        0x10                sshd
>   3390  27250   3390      0  3        0x92  poll          sshd
>  21865   9404  21865   1000  3        0x83  ttyin         ksh
>   9404  12081  12081   1000  3        0x90  select        sshd
>  12081  27250  12081      0  3        0x92  poll          sshd
>  31553      1  31553      0  3        0x83  ttyin         getty
>  25492      1  25492      0  3        0x80  select        cron
>  23049      1  23049     99  3        0x90  poll          sndiod
>  17520   2596   2596     95  3        0x90  kqread        smtpd
>  30507   2596   2596     95  3        0x90  kqread        smtpd
>  15728   2596   2596     95  3        0x90  kqread        smtpd
>  21278   2596   2596     95  3        0x90  kqread        smtpd
>  20677   2596   2596     95  3        0x90  kqread        smtpd
>   7937   2596   2596    103  3        0x90  kqread        smtpd
>   2596      1   2596      0  3        0x80  kqread        smtpd
>  27250      1  27250      0  3        0x80  select        sshd
>  15422  21838  21838     83  3        0x90  poll          ntpd
> --db_more--
>  21838      1  21838      0  3        0x80  poll          ntpd
>  30432  15344  15344     74  3        0x90  bpf           pflogd
>  15344      1  15344      0  3        0x80  netio         pflogd
>  23915  26895  26895     73  3        0x90  poll          syslogd
>  26895      1  26895      0  3        0x80  netio         syslogd
>  27216      1  27216     77  3        0x90  poll          dhclient
>  16326      1  16326      0  3        0x80  poll          dhclient
>  28617      0      0      0  3     0x14200  aiodoned      aiodoned
>  32169      0      0      0  3     0x14200  syncer        update
>  18972      0      0      0  3     0x14200  cleaner       cleaner
>  19331      0      0      0  3     0x14200  reaper        reaper
>   9312      0      0      0  3     0x14200  pgdaemon      pagedaemon
>  27783      0      0      0  3     0x14200  bored         crypto
>  24568      0      0      0  3     0x14200  pftm          pfpurge
>  31452      0      0      0  3     0x14200  usbtsk        usbtask
>  22103      0      0      0  3     0x14200  usbatsk       usbatsk
>   6707      0      0      0  3     0x14200  bored         sensors
>  25160      0      0      0  3     0x14200  bored         systqmp
>  13845      0      0      0  3     0x14200  bored         systq
>  28168      0      0      0  3     0x14200  bored         syswq
>  29666      0      0      0  3  0x40014200                idle0
>  23885      0      0      0  3     0x14200  kmalloc       kmthread
>      1      0      1      0  3        0x82  wait          init
> --db_more--
> LOM event: +73d+2h26m5s host FAULT: watchdog triggered
>      0     -1      0      0  3     0x10200  scheduler     swapper
> 
> If there is anything else I can do to help debug this issue let me know.
> 
> Looking through the commit logs for early Sept 2014 - there is nothing
> obvious that jumps out at me.
> 
> On -current I was getting lots of the following error messages on the
> console:
> 
> psycho0: correctable DMA error AFAR
> 
> I would then get an uncorrectable error and a panic.
> 
> Cheers
> 
> Fred
> 
> [1]
> console is /pci@1f,0/isa@7/serial@0,3f8
> Copyright (c) 1982, 1986, 1989, 1991, 1993
>       The Regents of the University of California.  All rights reserved.
> Copyright (c) 1995-2014 OpenBSD. All rights reserved. http://www.OpenBSD.org
> 
> OpenBSD 5.6-current (GENERIC) #203: Tue Sep  2 19:32:42 MDT 2014
>     [email protected]:/usr/src/sys/arch/sparc64/compile/GENERIC
> real mem = 536870912 (512MB)
> avail mem = 512761856 (489MB)
> mpath0 at root
> scsibus0 at mpath0: 256 targets
> mainbus0 at root: Sun Fire V100 (UltraSPARC-IIe 500MHz)
> cpu0 at mainbus0: SUNW,UltraSPARC-IIe (rev 1.4) @ 500 MHz
> cpu0: physical 16K instruction (32 b/l), 16K data (32 b/l), 256K external
> (64 b/l)
> psycho0 at mainbus0: SUNW,sabre, impl 0, version 0, ign 7c0
> psycho0: bus range 0-0, PCI bus 0
> psycho0: dvma map 60000000-7fffffff
> pci0 at psycho0
> ebus0 at pci0 dev 7 function 0 "Acer Labs M1533 ISA" rev 0x00
> "dma" at ebus0 addr 0-ffff ivec 0x2a not configured
> rtc0 at ebus0 addr 70-71: m5819
> power0 at ebus0 addr 2000-2007 ivec 0x23
> lom0 at ebus0 addr 8010-8011 ivec 0x2a: LOMlite2 rev 3.11
> com0 at ebus0 addr 3f8-3ff ivec 0x2b: ns16550a, 16 byte fifo
> com0: console
> com1 at ebus0 addr 2e8-2ef ivec 0x2b: ns16550a, 16 byte fifo
> "flashprom" at ebus0 addr 0-7ffff not configured
> alipm0 at pci0 dev 3 function 0 "Acer Labs M7101 Power" rev 0x00: 74KHz
> clock
> iic0 at alipm0
> "max1617" at alipm0 addr 0x18 skipped due to alipm0 bugs
> spdmem0 at iic0 addr 0x54: 128MB SDRAM registered ECC PC133CL2
> spdmem1 at iic0 addr 0x55: 128MB SDRAM registered ECC PC133CL2
> spdmem2 at iic0 addr 0x56: 128MB SDRAM registered ECC PC133CL2
> spdmem3 at iic0 addr 0x57: 128MB SDRAM registered ECC PC133CL2
> dc0 at pci0 dev 12 function 0 "Davicom DM9102" rev 0x31: ivec 0x7c6, address
> 00:03:ba:13:a8:c7
> amphy0 at dc0 phy 1: DM9102 10/100 PHY, rev. 0
> dc1 at pci0 dev 5 function 0 "Davicom DM9102" rev 0x31: ivec 0x7dc, address
> 00:03:ba:13:a8:c8
> amphy1 at dc1 phy 1: DM9102 10/100 PHY, rev. 0
> ohci0 at pci0 dev 10 function 0 "Acer Labs M5237 USB" rev 0x03: ivec 0x7e4,
> version 1.0, legacy support
> pciide0 at pci0 dev 13 function 0 "Acer Labs M5229 UDMA IDE" rev 0xc3: DMA,
> channel 0 configured to native-PCI, channel 1 configured to native-PCI
> pciide0: using ivec 0x7cc for native-PCI interrupt
> pciide0: channel 0 disabled (no drives)
> wd0 at pciide0 channel 1 drive 0: <WDC WD800BB-00DAA3>
> wd0: 16-sector PIO, LBA48, 76319MB, 156301488 sectors
> atapiscsi0 at pciide0 channel 1 drive 1
> scsibus1 at atapiscsi0: 2 targets
> cd0 at scsibus1 targ 0 lun 0: <TEAC, CD-224E, 1.7A> ATAPI 5/cdrom removable
> wd0(pciide0:1:0): using PIO mode 4, Ultra-DMA mode 2
> cd0(pciide0:1:1): using PIO mode 4, Ultra-DMA mode 2
> usb0 at ohci0: USB revision 1.0
> uhub0 at usb0 "Acer Labs OHCI root hub" rev 1.00/1.00 addr 1
> vscsi0 at root
> scsibus2 at vscsi0: 256 targets
> softraid0 at root
> scsibus3 at softraid0: 256 targets
> bootpath: /pci@1f,0/ide@d,0/disk@2,0
> root on wd0a (22f410dbbff6a15e.a) swap on wd0b dump on wd0b
> 
> [2]
> console is /pci@1f,0/isa@7/serial@0,3f8
> Copyright (c) 1982, 1986, 1989, 1991, 1993
>       The Regents of the University of California.  All rights reserved.
> Copyright (c) 1995-2014 OpenBSD. All rights reserved. http://www.OpenBSD.org
> 
> OpenBSD 5.6-current (GENERIC) #205: Thu Sep  4 10:59:20 MDT 2014
>     [email protected]:/usr/src/sys/arch/sparc64/compile/GENERIC
> real mem = 536870912 (512MB)
> avail mem = 512761856 (489MB)
> mpath0 at root
> scsibus0 at mpath0: 256 targets
> mainbus0 at root: Sun Fire V100 (UltraSPARC-IIe 500MHz)
> cpu0 at mainbus0: SUNW,UltraSPARC-IIe (rev 1.4) @ 500 MHz
> cpu0: physical 16K instruction (32 b/l), 16K data (32 b/l), 256K external
> (64 b/l)
> psycho0 at mainbus0: SUNW,sabre, impl 0, version 0, ign 7c0
> psycho0: bus range 0-0, PCI bus 0
> psycho0: dvma map 60000000-7fffffff
> pci0 at psycho0
> ebus0 at pci0 dev 7 function 0 "Acer Labs M1533 ISA" rev 0x00
> "dma" at ebus0 addr 0-ffff ivec 0x2a not configured
> rtc0 at ebus0 addr 70-71: m5819
> power0 at ebus0 addr 2000-2007 ivec 0x23
> lom0 at ebus0 addr 8010-8011 ivec 0x2a: LOMlite2 rev 3.11
> com0 at ebus0 addr 3f8-3ff ivec 0x2b: ns16550a, 16 byte fifo
> com0: console
> com1 at ebus0 addr 2e8-2ef ivec 0x2b: ns16550a, 16 byte fifo
> "flashprom" at ebus0 addr 0-7ffff not configured
> alipm0 at pci0 dev 3 function 0 "Acer Labs M7101 Power" rev 0x00: 74KHz
> clock
> iic0 at alipm0
> "max1617" at alipm0 addr 0x18 skipped due to alipm0 bugs
> spdmem0 at iic0 addr 0x54: 128MB SDRAM registered ECC PC133CL2
> spdmem1 at iic0 addr 0x55: 128MB SDRAM registered ECC PC133CL2
> spdmem2 at iic0 addr 0x56: 128MB SDRAM registered ECC PC133CL2
> spdmem3 at iic0 addr 0x57: 128MB SDRAM registered ECC PC133CL2
> dc0 at pci0 dev 12 function 0 "Davicom DM9102" rev 0x31: ivec 0x7c6, address
> 00:03:ba:13:a8:c7
> amphy0 at dc0 phy 1: DM9102 10/100 PHY, rev. 0
> dc1 at pci0 dev 5 function 0 "Davicom DM9102" rev 0x31: ivec 0x7dc, address
> 00:03:ba:13:a8:c8
> amphy1 at dc1 phy 1: DM9102 10/100 PHY, rev. 0
> ohci0 at pci0 dev 10 function 0 "Acer Labs M5237 USB" rev 0x03: ivec 0x7e4,
> version 1.0, legacy support
> pciide0 at pci0 dev 13 function 0 "Acer Labs M5229 UDMA IDE" rev 0xc3: DMA,
> channel 0 configured to native-PCI, channel 1 configured to native-PCI
> pciide0: using ivec 0x7cc for native-PCI interrupt
> pciide0: channel 0 disabled (no drives)
> wd0 at pciide0 channel 1 drive 0: <WDC WD800BB-00DAA3>
> wd0: 16-sector PIO, LBA48, 76319MB, 156301488 sectors
> atapiscsi0 at pciide0 channel 1 drive 1
> scsibus1 at atapiscsi0: 2 targets
> cd0 at scsibus1 targ 0 lun 0: <TEAC, CD-224E, 1.7A> ATAPI 5/cdrom removable
> wd0(pciide0:1:0): using PIO mode 4, Ultra-DMA mode 2
> cd0(pciide0:1:1): using PIO mode 4, Ultra-DMA mode 2
> usb0 at ohci0: USB revision 1.0
> uhub0 at usb0 "Acer Labs OHCI root hub" rev 1.00/1.00 addr 1
> vscsi0 at root
> scsibus2 at vscsi0: 256 targets
> softraid0 at root
> scsibus3 at softraid0: 256 targets
> bootpath: /pci@1f,0/ide@d,0/disk@2,0
> root on wd0a (22f410dbbff6a15e.a) swap on wd0b dump on wd0b
> 


Reply via email to