On 2015/09/04 18:02, Fred wrote: > On 06/23/15 15:30, Mark Kettenis wrote: > >>Date: Thu, 18 Jun 2015 23:25:15 +0200 (CEST) > >>From: Mark Kettenis <[email protected]> > >> > >>>Date: Thu, 18 Jun 2015 22:48:40 +0200 > >>>From: Jan Vlach <[email protected]> > >> > >>>psycho0: uncorrectable DMA error AFAR 6656a250 (pa=0 tte=0/49c10012) > >>>AFSR 410000ff40800000 > >> > >>AFAICT this indicates an uncorrectable ECC error during a DMA > >>transfer. Sadly that suggests your hardware is dying. > >> > >>Might be worth trying to reseat your memory modules, or swap them out. > > > >Bleah. Looked into the wrong manual. This isn't an ECC error but an > >IOMMU error instead. And that almost certainly is a kernel bug of > >some sorts. Will try to hunt it down. > > > >What was your last kernel that worked properly? > > > > With sthen@'s help I have tracked down the kernel that does not display this > issue for me its: > > OpenBSD 5.6-current (GENERIC) #203: Tue Sep 2 19:32:42 MDT 2014 > > full dmesg below [1] and following kernel does have a panic: > > OpenBSD 5.6-current (GENERIC) #205: Thu Sep 4 10:59:20 MDT 2014
could it be this? CVSROOT: /cvs Module name: src Changes by: [email protected] 2014/09/03 18:36:00 Modified files: sys/sys : pool.h sys/kern : subr_pool.c Log message: rework how pools with large pages (>PAGE_SIZE) are implemented. this moves the size of the pool page (not arch page) out of the pool allocator into struct pool. this lets us create only two pools for the automatically determined large page allocations instead of 256 of them. while here support using slack space in large pages for the pool_item_header by requiring km_alloc provide pool page aligned memory. lastly, instead of doing incorrect math to figure how how many arch pages to use for large pool pages, just use powers of two. ok mikeb@ > full dmesg for this is below at [2]. > > The panic happens when I do a tar -xvzf and pkg_add -v in another shell, the > panic is: > > panic: psycho0: uncorrectable DMA error AFAR 66742250 (pa=0 tte=0/65b46012) > AFSR 410000ff40800000 > kdb breakpoint at 15563a4 > Stopped at Debugger+0x8: nop > RUN AT LEAST 'trace' AND 'ps' AND INCLUDE OUTPUT WHEN REPORTING THIS PANIC! > DO NOT EVEN BOTHER REPORTING THIS WITHOUT INCLUDING THAT INFORMATION! > ddb> trace > psycho_ue(400007b2900, 40008f7ee00, 0, 4, 40008df7c40, 40008fd0cf0) at > psycho_u > e+0x7c > > intr_handler(e0017ec8, 400007b2a00, 3c582, 0, 0, 0) at intr_handler+0xc > sparc_interrupt(0, 3, 40008df5610, 4000f7e9cd8, 1, 0) at > sparc_interrupt+0x298 > sys_write(4000906dd50, 4000f7e9db8, 4000f7e9df8, 4000f7e6000, > fffffffffffffff0, > 14b) at sys_write+0xb0 > > syscall(4000f7e9ed0, 404, 25ee90e148, 25ee90e14c, 0, 0) at syscall+0x16c > softtrap(3, 262656ce84, 54, 25ee90c80c, 0, 0) at softtrap+0x19c > ddb> ps > PID PPID PGRP UID S FLAGS WAIT COMMAND > 29573 21263 21263 0 3 0x83 pipewr gzip > 21263 18483 21263 0 2 0x3 tar > 18483 20923 18483 1000 3 0x8b pause ksh > *20923 3390 3390 1000 7 0x10 sshd > 3390 27250 3390 0 3 0x92 poll sshd > 21865 9404 21865 1000 3 0x83 ttyin ksh > 9404 12081 12081 1000 3 0x90 select sshd > 12081 27250 12081 0 3 0x92 poll sshd > 31553 1 31553 0 3 0x83 ttyin getty > 25492 1 25492 0 3 0x80 select cron > 23049 1 23049 99 3 0x90 poll sndiod > 17520 2596 2596 95 3 0x90 kqread smtpd > 30507 2596 2596 95 3 0x90 kqread smtpd > 15728 2596 2596 95 3 0x90 kqread smtpd > 21278 2596 2596 95 3 0x90 kqread smtpd > 20677 2596 2596 95 3 0x90 kqread smtpd > 7937 2596 2596 103 3 0x90 kqread smtpd > 2596 1 2596 0 3 0x80 kqread smtpd > 27250 1 27250 0 3 0x80 select sshd > 15422 21838 21838 83 3 0x90 poll ntpd > --db_more-- > 21838 1 21838 0 3 0x80 poll ntpd > 30432 15344 15344 74 3 0x90 bpf pflogd > 15344 1 15344 0 3 0x80 netio pflogd > 23915 26895 26895 73 3 0x90 poll syslogd > 26895 1 26895 0 3 0x80 netio syslogd > 27216 1 27216 77 3 0x90 poll dhclient > 16326 1 16326 0 3 0x80 poll dhclient > 28617 0 0 0 3 0x14200 aiodoned aiodoned > 32169 0 0 0 3 0x14200 syncer update > 18972 0 0 0 3 0x14200 cleaner cleaner > 19331 0 0 0 3 0x14200 reaper reaper > 9312 0 0 0 3 0x14200 pgdaemon pagedaemon > 27783 0 0 0 3 0x14200 bored crypto > 24568 0 0 0 3 0x14200 pftm pfpurge > 31452 0 0 0 3 0x14200 usbtsk usbtask > 22103 0 0 0 3 0x14200 usbatsk usbatsk > 6707 0 0 0 3 0x14200 bored sensors > 25160 0 0 0 3 0x14200 bored systqmp > 13845 0 0 0 3 0x14200 bored systq > 28168 0 0 0 3 0x14200 bored syswq > 29666 0 0 0 3 0x40014200 idle0 > 23885 0 0 0 3 0x14200 kmalloc kmthread > 1 0 1 0 3 0x82 wait init > --db_more-- > LOM event: +73d+2h26m5s host FAULT: watchdog triggered > 0 -1 0 0 3 0x10200 scheduler swapper > > If there is anything else I can do to help debug this issue let me know. > > Looking through the commit logs for early Sept 2014 - there is nothing > obvious that jumps out at me. > > On -current I was getting lots of the following error messages on the > console: > > psycho0: correctable DMA error AFAR > > I would then get an uncorrectable error and a panic. > > Cheers > > Fred > > [1] > console is /pci@1f,0/isa@7/serial@0,3f8 > Copyright (c) 1982, 1986, 1989, 1991, 1993 > The Regents of the University of California. All rights reserved. > Copyright (c) 1995-2014 OpenBSD. All rights reserved. http://www.OpenBSD.org > > OpenBSD 5.6-current (GENERIC) #203: Tue Sep 2 19:32:42 MDT 2014 > [email protected]:/usr/src/sys/arch/sparc64/compile/GENERIC > real mem = 536870912 (512MB) > avail mem = 512761856 (489MB) > mpath0 at root > scsibus0 at mpath0: 256 targets > mainbus0 at root: Sun Fire V100 (UltraSPARC-IIe 500MHz) > cpu0 at mainbus0: SUNW,UltraSPARC-IIe (rev 1.4) @ 500 MHz > cpu0: physical 16K instruction (32 b/l), 16K data (32 b/l), 256K external > (64 b/l) > psycho0 at mainbus0: SUNW,sabre, impl 0, version 0, ign 7c0 > psycho0: bus range 0-0, PCI bus 0 > psycho0: dvma map 60000000-7fffffff > pci0 at psycho0 > ebus0 at pci0 dev 7 function 0 "Acer Labs M1533 ISA" rev 0x00 > "dma" at ebus0 addr 0-ffff ivec 0x2a not configured > rtc0 at ebus0 addr 70-71: m5819 > power0 at ebus0 addr 2000-2007 ivec 0x23 > lom0 at ebus0 addr 8010-8011 ivec 0x2a: LOMlite2 rev 3.11 > com0 at ebus0 addr 3f8-3ff ivec 0x2b: ns16550a, 16 byte fifo > com0: console > com1 at ebus0 addr 2e8-2ef ivec 0x2b: ns16550a, 16 byte fifo > "flashprom" at ebus0 addr 0-7ffff not configured > alipm0 at pci0 dev 3 function 0 "Acer Labs M7101 Power" rev 0x00: 74KHz > clock > iic0 at alipm0 > "max1617" at alipm0 addr 0x18 skipped due to alipm0 bugs > spdmem0 at iic0 addr 0x54: 128MB SDRAM registered ECC PC133CL2 > spdmem1 at iic0 addr 0x55: 128MB SDRAM registered ECC PC133CL2 > spdmem2 at iic0 addr 0x56: 128MB SDRAM registered ECC PC133CL2 > spdmem3 at iic0 addr 0x57: 128MB SDRAM registered ECC PC133CL2 > dc0 at pci0 dev 12 function 0 "Davicom DM9102" rev 0x31: ivec 0x7c6, address > 00:03:ba:13:a8:c7 > amphy0 at dc0 phy 1: DM9102 10/100 PHY, rev. 0 > dc1 at pci0 dev 5 function 0 "Davicom DM9102" rev 0x31: ivec 0x7dc, address > 00:03:ba:13:a8:c8 > amphy1 at dc1 phy 1: DM9102 10/100 PHY, rev. 0 > ohci0 at pci0 dev 10 function 0 "Acer Labs M5237 USB" rev 0x03: ivec 0x7e4, > version 1.0, legacy support > pciide0 at pci0 dev 13 function 0 "Acer Labs M5229 UDMA IDE" rev 0xc3: DMA, > channel 0 configured to native-PCI, channel 1 configured to native-PCI > pciide0: using ivec 0x7cc for native-PCI interrupt > pciide0: channel 0 disabled (no drives) > wd0 at pciide0 channel 1 drive 0: <WDC WD800BB-00DAA3> > wd0: 16-sector PIO, LBA48, 76319MB, 156301488 sectors > atapiscsi0 at pciide0 channel 1 drive 1 > scsibus1 at atapiscsi0: 2 targets > cd0 at scsibus1 targ 0 lun 0: <TEAC, CD-224E, 1.7A> ATAPI 5/cdrom removable > wd0(pciide0:1:0): using PIO mode 4, Ultra-DMA mode 2 > cd0(pciide0:1:1): using PIO mode 4, Ultra-DMA mode 2 > usb0 at ohci0: USB revision 1.0 > uhub0 at usb0 "Acer Labs OHCI root hub" rev 1.00/1.00 addr 1 > vscsi0 at root > scsibus2 at vscsi0: 256 targets > softraid0 at root > scsibus3 at softraid0: 256 targets > bootpath: /pci@1f,0/ide@d,0/disk@2,0 > root on wd0a (22f410dbbff6a15e.a) swap on wd0b dump on wd0b > > [2] > console is /pci@1f,0/isa@7/serial@0,3f8 > Copyright (c) 1982, 1986, 1989, 1991, 1993 > The Regents of the University of California. All rights reserved. > Copyright (c) 1995-2014 OpenBSD. All rights reserved. http://www.OpenBSD.org > > OpenBSD 5.6-current (GENERIC) #205: Thu Sep 4 10:59:20 MDT 2014 > [email protected]:/usr/src/sys/arch/sparc64/compile/GENERIC > real mem = 536870912 (512MB) > avail mem = 512761856 (489MB) > mpath0 at root > scsibus0 at mpath0: 256 targets > mainbus0 at root: Sun Fire V100 (UltraSPARC-IIe 500MHz) > cpu0 at mainbus0: SUNW,UltraSPARC-IIe (rev 1.4) @ 500 MHz > cpu0: physical 16K instruction (32 b/l), 16K data (32 b/l), 256K external > (64 b/l) > psycho0 at mainbus0: SUNW,sabre, impl 0, version 0, ign 7c0 > psycho0: bus range 0-0, PCI bus 0 > psycho0: dvma map 60000000-7fffffff > pci0 at psycho0 > ebus0 at pci0 dev 7 function 0 "Acer Labs M1533 ISA" rev 0x00 > "dma" at ebus0 addr 0-ffff ivec 0x2a not configured > rtc0 at ebus0 addr 70-71: m5819 > power0 at ebus0 addr 2000-2007 ivec 0x23 > lom0 at ebus0 addr 8010-8011 ivec 0x2a: LOMlite2 rev 3.11 > com0 at ebus0 addr 3f8-3ff ivec 0x2b: ns16550a, 16 byte fifo > com0: console > com1 at ebus0 addr 2e8-2ef ivec 0x2b: ns16550a, 16 byte fifo > "flashprom" at ebus0 addr 0-7ffff not configured > alipm0 at pci0 dev 3 function 0 "Acer Labs M7101 Power" rev 0x00: 74KHz > clock > iic0 at alipm0 > "max1617" at alipm0 addr 0x18 skipped due to alipm0 bugs > spdmem0 at iic0 addr 0x54: 128MB SDRAM registered ECC PC133CL2 > spdmem1 at iic0 addr 0x55: 128MB SDRAM registered ECC PC133CL2 > spdmem2 at iic0 addr 0x56: 128MB SDRAM registered ECC PC133CL2 > spdmem3 at iic0 addr 0x57: 128MB SDRAM registered ECC PC133CL2 > dc0 at pci0 dev 12 function 0 "Davicom DM9102" rev 0x31: ivec 0x7c6, address > 00:03:ba:13:a8:c7 > amphy0 at dc0 phy 1: DM9102 10/100 PHY, rev. 0 > dc1 at pci0 dev 5 function 0 "Davicom DM9102" rev 0x31: ivec 0x7dc, address > 00:03:ba:13:a8:c8 > amphy1 at dc1 phy 1: DM9102 10/100 PHY, rev. 0 > ohci0 at pci0 dev 10 function 0 "Acer Labs M5237 USB" rev 0x03: ivec 0x7e4, > version 1.0, legacy support > pciide0 at pci0 dev 13 function 0 "Acer Labs M5229 UDMA IDE" rev 0xc3: DMA, > channel 0 configured to native-PCI, channel 1 configured to native-PCI > pciide0: using ivec 0x7cc for native-PCI interrupt > pciide0: channel 0 disabled (no drives) > wd0 at pciide0 channel 1 drive 0: <WDC WD800BB-00DAA3> > wd0: 16-sector PIO, LBA48, 76319MB, 156301488 sectors > atapiscsi0 at pciide0 channel 1 drive 1 > scsibus1 at atapiscsi0: 2 targets > cd0 at scsibus1 targ 0 lun 0: <TEAC, CD-224E, 1.7A> ATAPI 5/cdrom removable > wd0(pciide0:1:0): using PIO mode 4, Ultra-DMA mode 2 > cd0(pciide0:1:1): using PIO mode 4, Ultra-DMA mode 2 > usb0 at ohci0: USB revision 1.0 > uhub0 at usb0 "Acer Labs OHCI root hub" rev 1.00/1.00 addr 1 > vscsi0 at root > scsibus2 at vscsi0: 256 targets > softraid0 at root > scsibus3 at softraid0: 256 targets > bootpath: /pci@1f,0/ide@d,0/disk@2,0 > root on wd0a (22f410dbbff6a15e.a) swap on wd0b dump on wd0b >
