On Thu, 15 Aug 2019 09:07:46 -0500 [email protected] wrote: > On 2019-08-15 00:03, [email protected] wrote: > > The SiI3214 SATALink card suffers from the identify problem in > > netbsd-9 and -current (PR kern/54289). > > > > Booting a netbsd-9 kernel, the drives failed to identify which > > caused RAIDframe to mark the 4 drives on that card (of 8) in my > > RAID as FAILED. > > Rebooting netbsd-8, the drives identify properly, but are still > > marked as > > FAILED. > > > > Is there any way to unmark them so the raid will configure and > > recover? Normally 'raidctl -C' is used during first time > > configuration. Could it be used to force configuration, ignoring > > the FAILED status? Would the RAID > > be recoverable with parity rebuild afterwards? > > This seems to have worked. The disks not being correctly > identified/attached > under netbsd-9 apparently had them recorded as failed on the > components that > did attach (on the machine's on-board intel ahcisata ports). > Rebooting netbsd-8, although the drives identified and attached > properly, they were > still considered failed components. > > Being a multiple-disk failure is usually fatal to a RAID, but the > components > weren't actually failed. Un-configuring with 'raidctl -u' then > forcing a > config with 'raidctl -C /path/to/config' did not show any fatal > errors and > subsequent 'raidctl -s' showed all component labels (w/serial number) > intact. > Parity rewrite took a long time. > > Afterwards, 'gpt show raid0d' and 'dkctl raid0d listwedges' showed > things to > be intact that far. Rebooting the machine, the RAID properly > autoconfigured. > 'fsck' reported the filesystem as clean (since it never got mounted > after the > failed reboot into netbsd-9). An 'fsck -f' run is in progress.
In general, with this sort of 'larger' set of failed components, you should be OK. There are a couple of scenarios for the different RAID sets you might have configured: 1) The components that 'failed' were not sufficient to fail the RAID set (i.e. it was just 'degraded'). In this case, the surviving components still have your data, but in degraded mode. Rebuild the 'failed' component, and you're good-to-go. 2) The components that 'failed' were enough to completely fail the RAID set upon configuration. In this case, the RAID set would not configure, and no data would be written to any of the components (save for the updating of the component labels). In this case you can use 'raidctl -C' to reconstruct the RAID set and be comfortable that your data is still intact (given that there wasn't actually a real failure, and no data was written to the RAID set). Yes, a parity rebuild will be needed, and it will be a NOP (but it doesn't know that :) ). The only place this gets tricky is if the RAID set does get configured and mounted -- in that case you don't want to use 'raidctl -C', as data on the surviving components will be out-of-sync with the failed components. In this case you're better off rebuilding in-place. Later... Greg Oster
