Hi all,

At work I've got a server with an LSI MegaRAID (dmesg below) that suddenly seems to be killing hard drives. Last Thursday I had one drive fail, and the system didn't begin rebuilding onto the hot spare until I rebooted.

Today I lost another drive in the same safte0. I pulled another replacement drive off the shelf, swapped out the dead one, did a bioctl -H 0:9 sd0 to mark it as a hot spare but no rebuild has started yet. Note that 1:0 in safte1 was already marked as a hot spare, but this is a separate safte enclosure and I've never been sure if the hot spare would work across enclosures. I've always had a hot spare in each safte enclosure until this happened.

Here's the latest bioctl -i ami0

 [EMAIL PROTECTED]:/home/jross $ sudo bioctl -v -i ami0
Volume  Status               Size Device
 ami0 0 Degraded      72999763968 sd0     RAID1
0 Failed 73403465728 0:13.0 safte0 <HITACHI HUS151473VL3800 S3C0>
                                                 '        J5VHVNPB'
1 Online 73403465728 0:10.0 safte0 <HITACHI HUS103073FL3800 SA1B>
                                                 'V3W09L5A0050B499004B'
 ami0 1 Online        72999763968 sd1     RAID1
0 Online 73403465728 0:11.0 safte0 <HITACHI HUS103073FL3800 SA1B>
                                                 'V3W06MNA0050B4AD01D3'
1 Online 73403465728 0:12.0 safte0 <HITACHI HUS103073FL3800 SA1B>
                                                 'V3W0A6VA0050B4A80C0C'
 ami0 2 Online        72999763968 sd2     RAID1
0 Online 73403465728 1:4.0 safte1 <HITACHI HUS103073FL3800 SA1B>
                                                 'V3VZV2JA0050B4AX04C2'
1 Online 73403465728 1:1.0 safte1 <HITACHI HUS103073FL3800 SA1B>
                                                 'V3W0726A0050B49W01CB'
ami0 3 Hot spare 73403465728 0:9.0 safte0 <HITACHI HUS103073FL3800 SA1B>
                                                 'V3W093EA0050B44V0578'
ami0 4 Hot spare 73403465728 1:0.0 safte1 <HITACHI HUS103073FL3800 SA1B>
                                                 'V3W07PSA0050B4710207'


Also interesting is that safte0 will not blink any of the drives, while safte 1 will.

[EMAIL PROTECTED]:/home/jross $ sudo bioctl -b 0:9 ami0
bioctl: BIOCBLINK: Operation not supported by device


Questions, then: these drives are all Hitachi Ultrastars 10K300 from 2005. Has any one had any bad experiences with them? They are all still under warranty, and I don't suppose it's out of the question that 2 drives out of 8 would fail within 72 hours of each other, especially if the lot was bad.

So far as I know, the SAFTE enclosures are identical. Why will one support blinking the drives and the other not?

Should the ami be rebuilding the sd0 now that I've set a hot spare without any other action on my part, or do I need to kick off the rebuild with bioctl -R 0:9 sd0.

So far I haven't stumbled on the magic combination to make bioctl -q work:
[EMAIL PROTECTED]:/home/jross $sudo bioctl -q 1:4
bioctl: Can't locate 1:4 device via /dev/bio
[EMAIL PROTECTED]:/home/jross $ sudo bioctl -q ami0
bioctl: DIOCINQ: No such file or directory
[EMAIL PROTECTED]:/home/jross $ sudo bioctl -q sd0
bioctl: DIOCINQ: Invalid argument

Hitachi's drive testing tool seems to be windows only, so are there any drive checking utilities that can check an individual drive when it's a part of a RAID1? Or is it safe to assume that if the drive fails in the RAID it is really dead. I'm trying to make sure I'm not seeing some kind of problem with the enclosure or the megaraid card before I start shipping drives back to Hitachi.

Thanks!

Jeff

OpenBSD 4.4-current (GENERIC.MP) #860: Mon Sep  1 13:55:06 MDT 2008
    [EMAIL PROTECTED]:/usr/src/sys/arch/i386/compile/GENERIC.MP
cpu0: Intel(R) Xeon(TM) CPU 2.66GHz ("GenuineIntel" 686-class) 2.67 GHz
cpu0: FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,SBF,CNXT-ID,xTPR
real mem  = 2146988032 (2047MB)
avail mem = 2067562496 (1971MB)
mainbus0 at root
bios0 at mainbus0: AT/286+ BIOS, date 02/09/05, BIOS32 rev. 0 @ 0xf0010, SMBIOS rev. 2.3 @ 0xf82a0 (48 entries)
bios0: vendor American Megatrends Inc. version "080008" date 02/09/2005
acpi0 at bios0: rev 0
acpi0: tables DSDT FACP APIC OEMB
acpi0: wakeup devices PS2K(S1) PS2M(S1) SMBS(S1) AUDI(S1) MODM(S1) USB0(S1) USB1(S1) USB2(S1) P0P1(S1)
acpitimer0 at acpi0: 3579545 Hz, 24 bits
acpimadt0 at acpi0 addr 0xfee00000: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: apic clock running at 133MHz
cpu1 at mainbus0: apid 6 (application processor)
cpu1: Intel(R) Xeon(TM) CPU 2.66GHz ("GenuineIntel" 686-class) 2.67 GHz
cpu1: FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,SBF,CNXT-ID,xTPR
cpu2 at mainbus0: apid 1 (application processor)
cpu2: Intel(R) Xeon(TM) CPU 2.66GHz ("GenuineIntel" 686-class) 2.67 GHz
cpu2: FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,SBF,CNXT-ID,xTPR
cpu3 at mainbus0: apid 7 (application processor)
cpu3: Intel(R) Xeon(TM) CPU 2.66GHz ("GenuineIntel" 686-class) 2.67 GHz
cpu3: FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,SBF,CNXT-ID,xTPR
ioapic0 at mainbus0: apid 8 pa 0xfec00000, version 20, 24 pins
ioapic1 at mainbus0: apid 9 pa 0xfec80000, version 20, 24 pins
ioapic2 at mainbus0: apid 10 pa 0xfec80400, version 20, 24 pins
acpiprt0 at acpi0: bus 0 (PCI0)
acpiprt1 at acpi0: bus 1 (P0P1)
acpiprt2 at acpi0: bus 3 (P2P3)
acpiprt3 at acpi0: bus 5 (P2P4)
acpicpu0 at acpi0
acpicpu1 at acpi0
acpicpu2 at acpi0
acpicpu3 at acpi0
acpibtn0 at acpi0: SPBT
bios0: ROM list: 0xc0000/0x8000 0xc8000/0x1800 0xc9800/0x2200
pci0 at mainbus0 bus 0: configuration mode 1 (no bios)
pchb0 at pci0 dev 0 function 0 "Intel E7501 Host" rev 0x01
ppb0 at pci0 dev 2 function 0 "Intel E7500 PCI" rev 0x01
pci1 at ppb0 bus 2
"Intel 82870P2 IOxAPIC" rev 0x04 at pci1 dev 28 function 0 not configured
ppb1 at pci1 dev 29 function 0 "Intel 82870P2 PCIX-PCIX" rev 0x04
pci2 at ppb1 bus 5
"Intel 82870P2 IOxAPIC" rev 0x04 at pci1 dev 30 function 0 not configured
ppb2 at pci1 dev 31 function 0 "Intel 82870P2 PCIX-PCIX" rev 0x04
pci3 at ppb2 bus 3
ppb3 at pci3 dev 3 function 0 "IBM 133 PCIX-PCIX" rev 0x03
pci4 at ppb3 bus 4
ami0 at pci4 dev 0 function 0 "Symbios Logic MegaRAID 320" rev 0x02: apic 9 int 0 (irq 10)
ami0: LSI 532, 32b, FW 414C, BIOS vH429, 128MB RAM
ami0: 2 channels, 0 FC loops, 3 logical drives
scsibus0 at ami0: 40 targets, initiator 40
sd0 at scsibus0 targ 0 lun 0: <AMI, Host drive #00, > SCSI2 0/direct fixed
sd0: 69618MB, 512 bytes/sec, 142577664 sec total
sd1 at scsibus0 targ 1 lun 0: <AMI, Host drive #01, > SCSI2 0/direct fixed
sd1: 69618MB, 512 bytes/sec, 142577664 sec total
sd2 at scsibus0 targ 2 lun 0: <AMI, Host drive #02, > SCSI2 0/direct fixed
sd2: 69618MB, 512 bytes/sec, 142577664 sec total
scsibus1 at ami0: 16 targets, initiator 16
safte0 at scsibus1 targ 6 lun 0: <SUPER, GEM318, 0> SCSI2 3/processor fixed
scsibus2 at ami0: 16 targets, initiator 16
safte1 at scsibus2 targ 6 lun 0: <SUPER, GEM318, 0> SCSI2 3/processor fixed
ahc0 at pci3 dev 6 function 0 "Adaptec AHA-29160 U160" rev 0x02: apic 9 int 4 (irq 10)
scsibus3 at ahc0: 16 targets, initiator 7
st0 at scsibus3 targ 6 lun 0: <SEAGATE, DAT 9SP40-000, 910B> SCSI3 1/sequential removable uhci0 at pci0 dev 29 function 0 "Intel 82801CA/CAM USB" rev 0x02: apic 8 int 16 (irq 10)
ppb4 at pci0 dev 30 function 0 "Intel 82801BA Hub-to-PCI" rev 0x42
pci5 at ppb4 bus 1
fxp0 at pci5 dev 1 function 0 "Intel 8255x" rev 0x10, i82551: apic 8 int 17 (irq 5), address 00:e0:81:26:a9:e4
inphy0 at fxp0 phy 1: i82555 10/100 PHY, rev. 4
vga1 at pci5 dev 2 function 0 "ATI Rage XL" rev 0x27
wsdisplay0 at vga1 mux 1: console (80x25, vt100 emulation)
wsdisplay0: screen 1-5 added (80x25, vt100 emulation)
skc0 at pci5 dev 4 function 0 "D-Link Systems DGE-530T A1" rev 0x11, Yukon (0x1): apic 8 int 19 (irq 9)
sk0 at skc0 port A: address 00:13:46:72:3b:1d
eephy0 at sk0 phy 0: Marvell 88E1011 Gigabit PHY, rev. 3
ichpcib0 at pci0 dev 31 function 0 "Intel 82801CA LPC" rev 0x02
pciide0 at pci0 dev 31 function 1 "Intel 82801CA IDE" rev 0x02: DMA, channel 0 configured to compatibility, channel 1 configured to compatibility
atapiscsi0 at pciide0 channel 0 drive 0
scsibus4 at atapiscsi0: 2 targets, initiator 7
cd0 at scsibus4 targ 0 lun 0: <SONY, DVD RW DW-U18A, UYS4> ATAPI 5/cdrom removable
cd0(pciide0:0:0): using PIO mode 4, Ultra-DMA mode 2
pciide0: channel 1 disabled (no drives)
ichiic0 at pci0 dev 31 function 3 "Intel 82801CA/CAM SMBus" rev 0x02: apic 8 int 17 (irq 0)
iic0 at ichiic0
lm1 at iic0 addr 0x29: W83782D
spdmem0 at iic0 addr 0x50: 512MB DDR SDRAM registered ECC PC2100CL2.5
spdmem1 at iic0 addr 0x51: 512MB DDR SDRAM registered ECC PC2300CL2.5
spdmem2 at iic0 addr 0x54: 512MB DDR SDRAM registered ECC PC2100CL2.5
spdmem3 at iic0 addr 0x55: 512MB DDR SDRAM registered ECC PC2100CL2.5
usb0 at uhci0: USB revision 1.0
uhub0 at usb0 "Intel UHCI root hub" rev 1.00/1.00 addr 1
isa0 at ichpcib0
isadma0 at isa0
com0 at isa0 port 0x3f8/8 irq 4: ns16550a, 16 byte fifo
com0: console
com1 at isa0 port 0x2f8/8 irq 3: ns16550a, 16 byte fifo
pckbc0 at isa0 port 0x60/5
pckbd0 at pckbc0 (kbd slot)
pckbc0: using irq 1 for kbd slot
wskbd0 at pckbd0: console keyboard, using wsdisplay0
pmsi0 at pckbc0 (aux slot)
pckbc0: using irq 12 for aux slot
wsmouse0 at pmsi0 mux 0
pcppi0 at isa0 port 0x61
midi0 at pcppi0: <PC speaker>
spkr0 at pcppi0
wbsio0 at isa0 port 0x2e/2: W83627HF rev 0x3a
lm2 at wbsio0 port 0x290/8: W83627HF
npx0 at isa0 port 0xf0/16: reported by CPUID; using exception 16
mtrr: Pentium Pro MTRR support
softraid0 at root
root on sd0a swap on sd0b dump on sd0b

Reply via email to