Hi everybody,

I'm running a RAIDframe-enabled kernel on amd64 (dmesg below), and this morning
I saw that one of the drives in a RAID1 array had failed:

ahci0: unrecoverable errors (IS: 8000000<IFS>), disabling port.
raid0: IO Error.  Marking /dev/sd3l as failed.
raid0: node (Rod) returned fail, rolling backward
Unable to verify raid1 parity: can't read stripe.
Could not verify parity.
raid0: Error re-writing parity!

It was busy re-writing the parity because of a hard lockup earlier. These
were the only errors I saw. Trying a "raidctl -R /dev/sd3l raid0" resulted
in a RAIDframe panic, the details of which I was unable to record. After a
reboot, that same command succeeded and it's now reconstructing.

What does the ahci0 error above mean? Is the drive bad, or is it a
motherboard issue? (Both are new, so I don't know the long-term behaviour
of either yet.)

The SMART error log does contain an error for this drive, and nothing for
the other drive:

$ sudo atactl sd3 smartreadlog summary 
Error count: 1

Error 1:
    error register: 0x84
    sector count register: 0x10
    LBA Low register: 0xd3
    LBA Mid register: 0x65
    LBA High register: 0x69
    device register: 0x45
    status register: 0x51
    state: off-line or self-test
    timestamp: 177
    history:
        control register:       0x00    0x00    0x00    0x00    0x00
        features register:      0x80    0x80    0x80    0x80    0x80
        sector count register:  0xb0    0xb8    0xc0    0xc8    0xd0
        LBA Low register:       0x63    0xe3    0x63    0xe3    0x63
        LBA Mid register:       0x63    0x63    0x64    0x64    0x65
        LBA High register:      0x69    0x69    0x69    0x69    0x69
        device register:        0x40    0x40    0x40    0x40    0x40
        command register:       0x60    0x60    0x60    0x60    0x60
        timestamp:              469383900       469383900       469383900
469383900       469383900


OpenBSD 4.6 (GENERIC+RAIDAUTO) #0: Wed Oct 21 22:20:32 CEST 2009
    
[email protected]:/usr/src/sys/arch/amd64/compile/GENERIC+RAIDAUTO
real mem = 2145910784 (2046MB)
avail mem = 2070822912 (1974MB)
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 2.5 @ 0xf06c0 (56 entries)
bios0: vendor American Megatrends Inc. version "0403" date 09/02/2008
bios0: ASUSTeK Computer INC. P5BV-C
acpi0 at bios0: rev 0
acpi0: tables DSDT FACP APIC MCFG OEMB HPET EINJ BERT ERST HEST
acpi0: wakeup devices P0P2(S4) P0P3(S4) P0P1(S4) PS2K(S4) PS2M(S4) UAR1(S4) 
UAR2(S4) USB0(S4) USB1(S4) USB2(S4) USB3(S4) EUSB(S4) MC97(S4) P0P4(S4) 
P0P5(S4) P0P6(S4) P0P7(S4) P0P8(S4) P0P9(S4) SLPB(S4)
acpitimer0 at acpi0: 3579545 Hz, 24 bits
acpimadt0 at acpi0 addr 0xfee00000: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: Intel(R) Core(TM)2 Quad CPU Q8300 @ 2.50GHz, 2500.05 MHz
cpu0: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,SBF,SSE3,MWAIT,DS-CPL,EST,TM2,CX16,xTPR,NXE,LONG
cpu0: 2MB 64b/line 8-way L2 cache
cpu0: apic clock running at 333MHz
cpu at mainbus0: not configured
cpu at mainbus0: not configured
cpu at mainbus0: not configured
ioapic0 at mainbus0 apid 4 pa 0xfec00000, version 20, 24 pins
acpihpet0 at acpi0: 14318179 Hz
acpiprt0 at acpi0: bus 0 (PCI0)
acpiprt1 at acpi0: bus 5 (P0P2)
acpiprt2 at acpi0: bus -1 (P0P3)
acpiprt3 at acpi0: bus 1 (P0P1)
acpiprt4 at acpi0: bus 4 (P0P4)
acpiprt5 at acpi0: bus -1 (P0P5)
acpiprt6 at acpi0: bus -1 (P0P6)
acpiprt7 at acpi0: bus -1 (P0P7)
acpiprt8 at acpi0: bus 3 (P0P8)
acpiprt9 at acpi0: bus 2 (P0P9)
acpicpu0 at acpi0: PSS
acpibtn0 at acpi0: SLPB
acpibtn1 at acpi0: PWRB
cpu0: Enhanced SpeedStep 2500 MHz: speeds: 2497, 1998 MHz
pci0 at mainbus0 bus 0
pchb0 at pci0 dev 0 function 0 "Intel 3200/3210 Host" rev 0x01
ppb0 at pci0 dev 1 function 0 "Intel 3200/3210 PCIE" rev 0x01: apic 4 int 16 
(irq 11)
pci1 at ppb0 bus 5
ppb1 at pci0 dev 28 function 0 "Intel 82801GB PCIE" rev 0x01: apic 4 int 16 
(irq 11)
pci2 at ppb1 bus 4
ppb2 at pci0 dev 28 function 4 "Intel 82801G PCIE" rev 0x01: apic 4 int 16 (irq 
11)
pci3 at ppb2 bus 3
mskc0 at pci3 dev 0 function 0 "Marvell Yukon 88E8056" rev 0x14, Yukon-2 EC 
Ultra rev. B0 (0x3): apic 4 int 16 (irq 11)
msk0 at mskc0 port A: address 00:26:18:65:bc:ab
eephy0 at msk0 phy 0: 88E1149 Gigabit PHY, rev. 1
ppb3 at pci0 dev 28 function 5 "Intel 82801G PCIE" rev 0x01: apic 4 int 17 (irq 
10)
pci4 at ppb3 bus 2
mskc1 at pci4 dev 0 function 0 "Marvell Yukon 88E8056" rev 0x14, Yukon-2 EC 
Ultra rev. B0 (0x3): apic 4 int 17 (irq 10)
msk1 at mskc1 port A: address 00:26:18:65:ba:d5
eephy1 at msk1 phy 0: 88E1149 Gigabit PHY, rev. 1
uhci0 at pci0 dev 29 function 0 "Intel 82801GB USB" rev 0x01: apic 4 int 23 
(irq 7)
uhci1 at pci0 dev 29 function 1 "Intel 82801GB USB" rev 0x01: apic 4 int 19 
(irq 5)
ehci0 at pci0 dev 29 function 7 "Intel 82801GB USB" rev 0x01: apic 4 int 23 
(irq 7)
usb0 at ehci0: USB revision 2.0
uhub0 at usb0 "Intel EHCI root hub" rev 2.00/1.00 addr 1
ppb4 at pci0 dev 30 function 0 "Intel 82801BA Hub-to-PCI" rev 0xe1
pci5 at ppb4 bus 1
rl0 at pci5 dev 0 function 0 "Realtek 8139" rev 0x10: apic 4 int 21 (irq 6), 
address 00:e0:4d:52:a8:d1
rlphy0 at rl0 phy 0: RTL internal PHY
ral0 at pci5 dev 1 function 0 "Ralink RT2561S" rev 0x00: apic 4 int 22 (irq 
11), address 00:11:6b:3d:7f:6a
ral0: MAC/BBP RT2561C, RF RT2527
vendor "NetMos", unknown product 0x9865 (class communications subclass serial, 
rev 0x00) at pci5 dev 2 function 0 not configured
vendor "NetMos", unknown product 0x9865 (class communications subclass serial, 
rev 0x00) at pci5 dev 2 function 1 not configured
vendor "NetMos", unknown product 0x9865 (class communications subclass 
parallel, rev 0x00) at pci5 dev 2 function 2 not configured
vga1 at pci5 dev 3 function 0 "XGI Technology Volari Z7" rev 0x00
wsdisplay0 at vga1 mux 1: console (80x25, vt100 emulation)
wsdisplay0: screen 1-5 added (80x25, vt100 emulation)
pcib0 at pci0 dev 31 function 0 "Intel 82801GB LPC" rev 0x01
pciide0 at pci0 dev 31 function 1 "Intel 82801GB IDE" rev 0x01: DMA, channel 0 
configured to compatibility, channel 1 configured to compatibility
pciide0: channel 0 disabled (no drives)
pciide0: channel 1 disabled (no drives)
ahci0 at pci0 dev 31 function 2 "Intel 82801GR AHCI" rev 0x01: apic 4 int 19 
(irq 5), AHCI 1.1
scsibus0 at ahci0: 32 targets
sd0 at scsibus0 targ 0 lun 0: <ATA, OCZ-VERTEX, 1.41> SCSI3 0/direct fixed
sd0: 30533MB, 512 bytes/sec, 62533296 sec total
sd1 at scsibus0 targ 1 lun 0: <ATA, OCZ-VERTEX, 1.41> SCSI3 0/direct fixed
sd1: 30533MB, 512 bytes/sec, 62533296 sec total
sd2 at scsibus0 targ 2 lun 0: <ATA, Hitachi HDT72103, ST2O> SCSI3 0/direct fixed
sd2: 305245MB, 512 bytes/sec, 625142448 sec total
sd3 at scsibus0 targ 3 lun 0: <ATA, Hitachi HDT72103, ST2O> SCSI3 0/direct fixed
sd3: 305245MB, 512 bytes/sec, 625142448 sec total
ichiic0 at pci0 dev 31 function 3 "Intel 82801GB SMBus" rev 0x01: apic 4 int 19 
(irq 5)
iic0 at ichiic0
lm1 at iic0 addr 0x2d: W83627DHG
spdmem0 at iic0 addr 0x51: 1GB DDR2 SDRAM ECC PC2-6400CL5
spdmem1 at iic0 addr 0x53: 1GB DDR2 SDRAM ECC PC2-6400CL5
usb1 at uhci0: USB revision 1.0
uhub1 at usb1 "Intel UHCI root hub" rev 1.00/1.00 addr 1
usb2 at uhci1: USB revision 1.0
uhub2 at usb2 "Intel UHCI root hub" rev 1.00/1.00 addr 1
isa0 at pcib0
isadma0 at isa0
com0 at isa0 port 0x3f8/8 irq 4: ns16550a, 16 byte fifo
com1 at isa0 port 0x2f8/8 irq 3: ns16550a, 16 byte fifo
pckbc0 at isa0 port 0x60/5
pckbd0 at pckbc0 (kbd slot)
pckbc0: using irq 1 for kbd slot
wskbd0 at pckbd0: console keyboard, using wsdisplay0
pcppi0 at isa0 port 0x61
midi0 at pcppi0: <PC speaker>
spkr0 at pcppi0
wbsio0 at isa0 port 0x2e/2: W83627DHG rev 0x25
lm2 at wbsio0 port 0x290/8: W83627DHG
lm1 detached
Kernelized RAIDframe activated
mtrr: Pentium Pro MTRR support
raid1 at root: (RAID Level 1) total number of sectors is 62524800 (30529 MB) as 
root
raid0 at root: (RAID Level 1) total number of sectors is 563913472 (275348 MB)
softraid0 at root
root on raid1a
WARNING: / was not properly unmounted
swapmount: no device
ahci0: unrecoverable errors (IS: 8000000<IFS>), disabling port.
raid0: IO Error.  Marking /dev/sd3l as failed.
raid0: node (Rod) returned fail, rolling backward
Unable to verify raid1 parity: can't read stripe.
Could not verify parity.
raid0: Error re-writing parity!
ral0: Michael MIC failureClosing the opened device: /dev/sd3l
About to (re-)open the device for rebuilding: /dev/sd3l
RECON: Initiating in-place reconstruction on
       row 0 col 0 -> spare at row 0 col 0.
Quiescence reached...

Thanks,
-- 
Jurjen Oskam

Savage's Law of Expediency:
        You want it bad, you'll get it bad.

Reply via email to