My natural answer is that this is a firmware issue. But since you provided such good steps I will try to recreate this. Thank you for this outstanding report.
On Wed, Feb 20, 2008 at 01:42:59AM -0700, Matthew Mulrooney wrote: > Hi there, I'm back with another LSI controller, and I'm experiencing > problems with creating hot spares from bioctl. This seems to be the same > problem that I posted to misc@ on Oct 16, 2006 with the subject line of: > > [ami] Unable to set "Hot Spare" on MegaRAID SATA 300-8x > > I've got the same symptoms, but now with a PERC 4/Di controller. [And this > time I've found a better work around than just avoiding bioctl -H with this > LSI controller :).] > > Problem summary > =============== > When I use bioctl to mark an Unused drive as a Hot Spare, that drive will > fail to be integrated when another disk fails. > > The only way, that I've found, to make that drive properly act as a Hot > Spare, is to only set it as such from the LSI boot menu. If you have > already marked it as a Hot Spare from bioctl, pull the Hot Spare-marked > drive, and replace it (it can be the same physical disk). At that point > your disk should be showing up as an 'Unused' disk, from where you can go > do the thing in the LSI boot menu. > > This is an improvement over my 2006 analysis of the situation, where I > couldn't find a way to reset the drive back to Unused (after Hot Sparing it > from bioctl). The LSI boot menu requires a drive to be in an Unused state > before it will allow me to correctly mark it as a Hot Spare. > > > If you're interested, please let me know what I can do to be of assistance > in trouble shooting this. I have a limited window before this box will > have to be pushed into production, and I can live with the current > situation (an after hours reboot in the case of a drive failure is > perfectly fine). > > Matthew > > > Test case > ========= > s => step succeeded > F => step failed > > Normal case (RAID 1 + one hot spare) > ----------- > s Configure array from the LSI boot menu > s Clear configuration > s New configuration > s Disks 0, 1: RAID 1 array > s Disk 2: Hot spare > > s Install OpenBSD-4.2 > > s Single disk failure > s Disk 0: Fails (I pulled it from the hot swap cage) > s Disk 2: Automatically replaces it > s Observe the RAID 1 array get fully rebuilt > > s Replace failed disk > s Replace Disk 0 with a new disk > s Observe that Disk 0 is marked as "Unused" through bioctl > s Set Disk 0 to be a hot spare (through bioctl) > > s Single disk failure > s Disk 1: Fails (I pulled it) > F Disk 0: FAILS TO GET INTEGRATED, DESPITE STILL BEING MARKED AS A > HOT SPARE - Array is still degraded. > > s Reboot, enter into the LSI boot menu > s Configure > View/Add Configurarion > s Highlight disk 0 > F4 (hot spare) > s "This Physical Drive is already a HOTSPARE\nPress any key to > continue" > s F10 (Configure), Esc, Esc > s "Exit?" = YES > s "Please REBOOT YOUR SYSTEM", CTRL-ALT-DEL > > s Recheck array > F Disk 0: Still failing to integrate. Array still degraded. > > s Attempt to shake loose the 'Hot Spare' bit from disk 0 > s Remove disk 0 > s Replace disk 0 (with the same physical disk) > s Disk 0 is *no longer* marked as a 'Hot Spare' (either through > bioctl or through the LSI boot menu). Yeah! :) > [I don't think I tested this method with my SATA 300-8x.] > > > Log file > ======== > # The output is generated by: > # date; bioctl ami0 > > ############################################################################## > # Created a new RAID 1 array from the LSI boot menu and installed OpenBSD 4.2 > Tue Feb 19 04:01:42 MST 2008 > Volume Status Size Device > ami0 0 Scrubbing 146695782400 sd0 RAID1 3% done > 0 Online 146811125760 0:0.0 safte0 <MAXTOR > ATLAS10K5_146SCAJNZM> > 1 Online 146811125760 0:1.0 safte0 <SEAGATE ST3146807LC > DS09> > ami0 1 Hot spare 146811125760 0:2.0 safte0 <IBM > IC35L146UCDY10-0S27F> > > Tue Feb 19 10:02:15 MST 2008 > Volume Status Size Device > ami0 0 Scrubbing 146695782400 sd0 RAID1 94% done > 0 Online 146811125760 0:0.0 safte0 <MAXTOR > ATLAS10K5_146SCAJNZM> > 1 Online 146811125760 0:1.0 safte0 <SEAGATE ST3146807LC > DS09> > ami0 1 Hot spare 146811125760 0:2.0 safte0 <IBM > IC35L146UCDY10-0S27F> > > Tue Feb 19 10:12:15 MST 2008 > Volume Status Size Device > ami0 0 Scrubbing 146695782400 sd0 RAID1 97% done > 0 Online 146811125760 0:0.0 safte0 <MAXTOR > ATLAS10K5_146SCAJNZM> > 1 Online 146811125760 0:1.0 safte0 <SEAGATE ST3146807LC > DS09> > ami0 1 Hot spare 146811125760 0:2.0 safte0 <IBM > IC35L146UCDY10-0S27F> > > ############################################################################## > # Mirroring complete > Tue Feb 19 10:22:16 MST 2008 > Volume Status Size Device > ami0 0 Online 146695782400 sd0 RAID1 > 0 Online 146811125760 0:0.0 safte0 <MAXTOR > ATLAS10K5_146SCAJNZM> > 1 Online 146811125760 0:1.0 safte0 <SEAGATE ST3146807LC > DS09> > ami0 1 Hot spare 146811125760 0:2.0 safte0 <IBM > IC35L146UCDY10-0S27F> > > ############################################################################## > # Pulling Drive 0:0.0 > Tue Feb 19 16:15:15 MST 2008 > Volume Status Size Device > ami0 0 Online 146695782400 sd0 RAID1 > 0 Online 146811125760 0:0.0 safte0 <MAXTOR > ATLAS10K5_146SCAJNZM> > 1 Online 146811125760 0:1.0 safte0 <SEAGATE ST3146807LC > DS09> > ami0 1 Hot spare 146811125760 0:2.0 safte0 <IBM > IC35L146UCDY10-0S27F> > > ############################################################################## > # LSI boot menu-defined 'Hot Spare' has been integrated > Tue Feb 19 16:15:26 MST 2008 > Volume Status Size Device > ami0 0 Rebuild 146695782400 sd0 RAID1 0% done > 0 Rebuild 146811125760 0:2.0 safte0 <IBM > IC35L146UCDY10-0S27F> > 1 Online 146811125760 0:1.0 safte0 <SEAGATE ST3146807LC > DS09> > > Tue Feb 19 17:06:14 MST 2008 > Volume Status Size Device > ami0 0 Rebuild 146695782400 sd0 RAID1 18% done > 0 Rebuild 146811125760 0:2.0 safte0 <IBM > IC35L146UCDY10-0S27F> > 1 Online 146811125760 0:1.0 safte0 <SEAGATE ST3146807LC > DS09> > > Tue Feb 19 20:46:38 MST 2008 > Volume Status Size Device > ami0 0 Rebuild 146695782400 sd0 RAID1 98% done > 0 Rebuild 146811125760 0:2.0 safte0 <IBM > IC35L146UCDY10-0S27F> > 1 Online 146811125760 0:1.0 safte0 <SEAGATE ST3146807LC > DS09> > > Tue Feb 19 20:56:39 MST 2008 > Volume Status Size Device > ami0 0 Online 146695782400 sd0 RAID1 > 0 Online 146811125760 0:2.0 safte0 <IBM > IC35L146UCDY10-0S27F> > 1 Online 146811125760 0:1.0 safte0 <SEAGATE ST3146807LC > DS09> > > ############################################################################## > # Mirroring complete > Tue Feb 19 21:06:40 MST 2008 > Volume Status Size Device > ami0 0 Online 146695782400 sd0 RAID1 > 0 Online 146811125760 0:2.0 safte0 <IBM > IC35L146UCDY10-0S27F> > 1 Online 146811125760 0:1.0 safte0 <SEAGATE ST3146807LC > DS09> > > Tue Feb 19 21:46:45 MST 2008 > Volume Status Size Device > ami0 0 Online 146695782400 sd0 RAID1 > 0 Online 146811125760 0:2.0 safte0 <IBM > IC35L146UCDY10-0S27F> > 1 Online 146811125760 0:1.0 safte0 <SEAGATE ST3146807LC > DS09> > > ############################################################################## > # Replaced 0:0.0 > Tue Feb 19 21:49:59 MST 2008 > Volume Status Size Device > ami0 0 Online 146695782400 sd0 RAID1 > 0 Online 146811125760 0:2.0 safte0 <IBM > IC35L146UCDY10-0S27F> > 1 Online 146811125760 0:1.0 safte0 <SEAGATE ST3146807LC > DS09> > ami0 1 Unused 146811125760 0:0.0 safte0 <IBM > IC35L146UCDY10-0S27F> > > ############################################################################## > # Marking 0:0.0 as Hot Spare from bioctl (bioctl -H 0:0.0 ami0) > Tue Feb 19 21:51:56 MST 2008 > Volume Status Size Device > ami0 0 Online 146695782400 sd0 RAID1 > 0 Online 146811125760 0:2.0 safte0 <IBM > IC35L146UCDY10-0S27F> > 1 Online 146811125760 0:1.0 safte0 <SEAGATE ST3146807LC > DS09> > ami0 1 Unused 146811125760 0:0.0 safte0 <IBM > IC35L146UCDY10-0S27F> > > Tue Feb 19 21:52:07 MST 2008 > Volume Status Size Device > ami0 0 Online 146695782400 sd0 RAID1 > 0 Online 146811125760 0:2.0 safte0 <IBM > IC35L146UCDY10-0S27F> > 1 Online 146811125760 0:1.0 safte0 <SEAGATE ST3146807LC > DS09> > ami0 1 Hot spare 146811125760 0:0.0 safte0 <IBM > IC35L146UCDY10-0S27F> > > ############################################################################## > # Pulling 0:1.0 > Tue Feb 19 21:53:02 MST 2008 > Volume Status Size Device > ami0 0 Degraded 146695782400 sd0 RAID1 > 0 Online 146811125760 0:2.0 safte0 <IBM > IC35L146UCDY10-0S27F> > 1 Failed 146811125760 0:1.0 safte0 <> > ami0 1 Hot spare 146811125760 0:0.0 safte0 <IBM > IC35L146UCDY10-0S27F> > > ############################################################################## > # bioctl-defined Hot Spare 0:1.0 has failed to integrate > Tue Feb 19 21:53:15 MST 2008 > Volume Status Size Device > ami0 0 Degraded 146695782400 sd0 RAID1 > 0 Online 146811125760 0:2.0 safte0 <IBM > IC35L146UCDY10-0S27F> > 1 Failed 146811125760 0:1.0 safte0 <> > ami0 1 Hot spare 146811125760 0:0.0 safte0 <IBM > IC35L146UCDY10-0S27F> > > Tue Feb 19 21:53:37 MST 2008 > Volume Status Size Device > ami0 0 Degraded 146695782400 sd0 RAID1 > 0 Online 146811125760 0:2.0 safte0 <IBM > IC35L146UCDY10-0S27F> > 1 Failed 146811125760 0:1.0 safte0 <> > ami0 1 Hot spare 146811125760 0:0.0 safte0 <IBM > IC35L146UCDY10-0S27F> > > Tue Feb 19 22:06:04 MST 2008 > Volume Status Size Device > ami0 0 Degraded 146695782400 sd0 RAID1 > 0 Online 146811125760 0:2.0 safte0 <IBM > IC35L146UCDY10-0S27F> > 1 Failed 146811125760 0:1.0 safte0 <> > ami0 1 Hot spare 146811125760 0:0.0 safte0 <IBM > IC35L146UCDY10-0S27F> > > ############################################################################## > # System rebooted - no change > Tue Feb 19 22:25:56 MST 2008 > Volume Status Size Device > ami0 0 Degraded 146695782400 sd0 RAID1 > 0 Online 146811125760 0:2.0 safte0 <IBM > IC35L146UCDY10-0S27F> > 1 Failed 146811125760 0:1.0 safte0 <> > ami0 1 Hot spare 146811125760 0:0.0 safte0 <IBM > IC35L146UCDY10-0S27F> > > Wed Feb 20 00:50:21 MST 2008 > Volume Status Size Device > ami0 0 Degraded 146695782400 sd0 RAID1 > 0 Online 146811125760 0:2.0 safte0 <IBM > IC35L146UCDY10-0S27F> > 1 Failed 146811125760 0:1.0 safte0 <> > ami0 1 Hot spare 146811125760 0:0.0 safte0 <IBM > IC35L146UCDY10-0S27F> > > ############################################################################## > # Pulling drive 0 (in an attempt to undo the bioctl hot sparing) > Wed Feb 20 00:50:44 MST 2008 > Volume Status Size Device > ami0 0 Degraded 146695782400 sd0 RAID1 > 0 Online 146811125760 0:2.0 safte0 <IBM > IC35L146UCDY10-0S27F> > 1 Failed 146811125760 0:1.0 safte0 <> > > ############################################################################## > # Replaced drive 0 > Wed Feb 20 00:51:07 MST 2008 > Volume Status Size Device > ami0 0 Degraded 146695782400 sd0 RAID1 > 0 Online 146811125760 0:2.0 safte0 <IBM > IC35L146UCDY10-0S27F> > 1 Failed 146811125760 0:1.0 safte0 <> > ami0 1 Unused 146811125760 0:0.0 safte0 <IBM > IC35L146UCDY10-0S27F> > > ############################################################################## > # Success! That drive is back to a status of Unused!! > Wed Feb 20 00:51:18 MST 2008 > Volume Status Size Device > ami0 0 Degraded 146695782400 sd0 RAID1 > 0 Online 146811125760 0:2.0 safte0 <IBM > IC35L146UCDY10-0S27F> > 1 Failed 146811125760 0:1.0 safte0 <> > ami0 1 Unused 146811125760 0:0.0 safte0 <IBM > IC35L146UCDY10-0S27F> > > ############################################################################## > # Rebooted and set drive 0 to Hot Spare from the LSI boot menu > Wed Feb 20 01:08:06 MST 2008 > Volume Status Size Device > ami0 0 Rebuild 146695782400 sd0 RAID1 1% done > 0 Online 146811125760 0:2.0 safte0 <IBM > IC35L146UCDY10-0S27F> > 1 Rebuild 146811125760 0:0.0 safte0 <IBM > IC35L146UCDY10-0S27F> > > ############################################################################## > # Success! The array is rebuilding! > Wed Feb 20 01:18:07 MST 2008 > Volume Status Size Device > ami0 0 Rebuild 146695782400 sd0 RAID1 5% done > 0 Online 146811125760 0:2.0 safte0 <IBM > IC35L146UCDY10-0S27F> > 1 Rebuild 146811125760 0:0.0 safte0 <IBM > IC35L146UCDY10-0S27F> > > > > > System configuration (dmesg) > ===== > OpenBSD 4.2 (GENERIC.MP) #252: Tue Aug 28 10:53:04 MDT 2007 > [EMAIL PROTECTED]:/usr/src/sys/arch/i386/compile/GENERIC.MP > cpu0: Intel(R) Xeon(TM) CPU 2.40GHz ("GenuineIntel" 686-class) 2.40 GHz > cpu0: > FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,SBF,CNXT-ID,xTPR > real mem = 2146861056 (2047MB) > avail mem = 2068238336 (1972MB) > mainbus0 at root > bios0 at mainbus0: AT/286+ BIOS, date 02/21/05, BIOS32 rev. 0 @ 0xffe90, > SMBIOS rev. 2.3 @ 0xfaf40 (71 entries) > bios0: vendor Dell Computer Corporation version "A14" date 02/21/2005 > bios0: Dell Computer Corporation PowerEdge 2600 > pcibios0 at bios0: rev 2.1 @ 0xf0000/0x10000 > pcibios0: PCI IRQ Routing Table rev 1.0 @ 0xfc160/224 (12 entries) > pcibios0: PCI Interrupt Router at 000:31:0 ("Intel 82801CA LPC" rev 0x00) > pcibios0: PCI bus #11 is the last bus > bios0: ROM list: 0xc0000/0x8000 0xc8000/0x2200 0xec000/0x4000! > acpi at mainbus0 not configured > ipmi0 at mainbus0: version 1.0 interface BT iobase 0xe4/3 spacing 1 irq 10 > mainbus0: Intel MP Specification (Version 1.4) > cpu0 at mainbus0: apid 0 (boot processor) > cpu0: apic clock running at 132 MHz > cpu1 at mainbus0: apid 6 (application processor) > cpu1: Intel(R) Xeon(TM) CPU 2.40GHz ("GenuineIntel" 686-class) 2.40 GHz > cpu1: > FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,SBF,CNXT-ID,xTPR > mainbus0: bus 0 is type PCI > mainbus0: bus 1 is type PCI > mainbus0: bus 2 is type PCI > mainbus0: bus 3 is type PCI > mainbus0: bus 4 is type PCI > mainbus0: bus 5 is type PCI > mainbus0: bus 6 is type PCI > mainbus0: bus 7 is type PCI > mainbus0: bus 8 is type PCI > mainbus0: bus 9 is type PCI > mainbus0: bus 10 is type PCI > mainbus0: bus 11 is type PCI > mainbus0: bus 12 is type ISA > ioapic0 at mainbus0: apid 8 pa 0xfec00000, version 20, 24 pins > ioapic0: misconfigured as apic 0, remapped to apid 8 > ioapic1 at mainbus0: apid 9 pa 0xfec80000, version 20, 24 pins > ioapic1: misconfigured as apic 0, remapped to apid 9 > ioapic2 at mainbus0: apid 10 pa 0xfec81000, version 20, 24 pins > ioapic2: misconfigured as apic 0, remapped to apid 10 > ioapic3 at mainbus0: apid 11 pa 0xfec82000, version 20, 24 pins > ioapic3: misconfigured as apic 0, remapped to apid 11 > ioapic4 at mainbus0: apid 12 pa 0xfec82800, version 20, 24 pins > ioapic4: misconfigured as apic 0, remapped to apid 12 > pci0 at mainbus0 bus 0: configuration mode 1 (no bios) > pchb0 at pci0 dev 0 function 0 "Intel E7501 MCH Host" rev 0x01 > ppb0 at pci0 dev 2 function 0 "Intel E7500 MCH" rev 0x01 > pci1 at ppb0 bus 1 > "Intel 82870P2 IOxAPIC" rev 0x04 at pci1 dev 28 function 0 not configured > ppb1 at pci1 dev 29 function 0 "Intel 82870P2 PCIX-PCIX" rev 0x04 > pci2 at ppb1 bus 2 > fxp0 at pci2 dev 2 function 0 "Intel 8255x" rev 0x0d, i82550: apic 9 int 0 > (irq 5), address 00:02:b3:e8:25:b2 > inphy0 at fxp0 phy 1: i82555 10/100 PHY, rev. 4 > "Intel 82870P2 IOxAPIC" rev 0x04 at pci1 dev 30 function 0 not configured > ppb2 at pci1 dev 31 function 0 "Intel 82870P2 PCIX-PCIX" rev 0x04 > pci3 at ppb2 bus 3 > em0 at pci3 dev 1 function 0 "Intel PRO/1000XT (82544GC)" rev 0x02: apic 9 > int 4 (irq 5), address 00:0f:1f:67:39:ea > ppb3 at pci0 dev 3 function 0 "Intel E7500 MCH" rev 0x01 > pci4 at ppb3 bus 4 > "Intel 82870P2 IOxAPIC" rev 0x04 at pci4 dev 28 function 0 not configured > ppb4 at pci4 dev 29 function 0 "Intel 82870P2 PCIX-PCIX" rev 0x04 > pci5 at ppb4 bus 5 > "Intel 82870P2 IOxAPIC" rev 0x04 at pci4 dev 30 function 0 not configured > ppb5 at pci4 dev 31 function 0 "Intel 82870P2 PCIX-PCIX" rev 0x04 > pci6 at ppb5 bus 6 > ppb6 at pci0 dev 4 function 0 "Intel E7500 MCH" rev 0x01 > pci7 at ppb6 bus 7 > "Intel 82870P2 IOxAPIC" rev 0x04 at pci7 dev 28 function 0 not configured > ppb7 at pci7 dev 29 function 0 "Intel 82870P2 PCIX-PCIX" rev 0x04 > pci8 at ppb7 bus 8 > ami0 at pci8 dev 8 function 0 "Dell PERC 4/Di i960" rev 0x01: apic 11 int 0 > (irq 11) > ami0: Dell 123, 64b/lhc, FW 251X, BIOS v1.07, 128MB RAM > ami0: 2 channels, 0 FC loops, 1 logical drives > scsibus0 at ami0: 40 targets > sd0 at scsibus0 targ 0 lun 0: <AMI, Host drive #00, > SCSI2 0/direct fixed > sd0: 139900MB, 17834 cyl, 255 head, 63 sec, 512 bytes/sec, 286515200 sec total > scsibus1 at ami0: 16 targets > safte0 at scsibus1 targ 6 lun 0: <PE/PV, 1x6 SCSI BP, 1.1> SCSI2 3/processor > fixed > scsibus2 at ami0: 16 targets > "Intel 82870P2 IOxAPIC" rev 0x04 at pci7 dev 30 function 0 not configured > ppb8 at pci7 dev 31 function 0 "Intel 82870P2 PCIX-PCIX" rev 0x04 > pci9 at ppb8 bus 10 > uhci0 at pci0 dev 29 function 0 "Intel 82801CA/CAM USB" rev 0x02: apic 8 int > 16 (irq 11) > ppb9 at pci0 dev 30 function 0 "Intel 82801BA AGP" rev 0x42 > pci10 at ppb9 bus 11 > vga1 at pci10 dev 4 function 0 "ATI Rage XL" rev 0x27 > wsdisplay0 at vga1 mux 1: console (80x25, vt100 emulation) > wsdisplay0: screen 1-5 added (80x25, vt100 emulation) > ichpcib0 at pci0 dev 31 function 0 "Intel 82801CA LPC" rev 0x02: 24-bit timer > at 3579545Hz > pciide0 at pci0 dev 31 function 1 "Intel 82801CA IDE" rev 0x02: DMA, channel > 0 configured to compatibility, channel 1 configured to compatibility > atapiscsi0 at pciide0 channel 0 drive 0 > scsibus3 at atapiscsi0: 2 targets > cd0 at scsibus3 targ 0 lun 0: <TEAC, CD-224E, K.9A> SCSI0 5/cdrom removable > cd0(pciide0:0:0): using PIO mode 4, Ultra-DMA mode 2 > pciide0: channel 1 disabled (no drives) > usb0 at uhci0: USB revision 1.0 > uhub0 at usb0: Intel UHCI root hub, rev 1.00/1.00, addr 1 > isa0 at ichpcib0 > isadma0 at isa0 > pckbc0 at isa0 port 0x60/5 > pckbd0 at pckbc0 (kbd slot) > pckbc0: using irq 1 for kbd slot > wskbd0 at pckbd0: console keyboard, using wsdisplay0 > pcppi0 at isa0 port 0x61 > midi0 at pcppi0: <PC speaker> > spkr0 at pcppi0 > lpt0 at isa0 port 0x378/4 irq 7 > npx0 at isa0 port 0xf0/16: reported by CPUID; using exception 16 > pccom0 at isa0 port 0x3f8/8 irq 4: ns16550a, 16 byte fifo > pccom1 at isa0 port 0x2f8/8 irq 3: ns16550a, 16 byte fifo > fdc0 at isa0 port 0x3f0/6 irq 6 drq 2 > fd0 at fdc0 drive 0: 1.44MB 80 cyl, 2 head, 18 sec > fd1 at fdc0 drive 1: density unknown > pctr: user-level cycle counter enabled > mtrr: Pentium Pro MTRR support > dkcsum: sd0 matches BIOS drive 0x80 > root on sd0a swap on sd0b dump on sd0b