Hi Andy,
> In a raid configuration with a spare disk, if one of the active disks
> fails, then the spare disk is automatically configured to replace the
> failed disk. If the failed disk is then replaced with raidhotremove /
> raidhotadd, this disk now becomes the spare disk. If this happens
> should I modify the raidtab file since it is now not consistent with
> the state of the raid? The spare-disk device in the raidtab
> file is not really the spare disk.
Yes you should - having a raidtab file that correctly reflects the actual
state of your raid array can be a real lifesaver.. see below.
> A related question I have is what happens if 3 drives are
> marked as bad
> in a raid5 + hot spare? This happened to me due to hardware problems
> (most likely APIC errors). Although the disks are actually
> likely to be
> fine, I couldn't start the raid once the disks were marked as bad. I
> couldn't do raidhotadd or raidhotremove since it said the raid wasn't
> active, but then I couldn't start the raid either since too many
> devices were marked as failed. It seems like something like
> this could
> happen if there is a loose scsi cable or some other hardware problem.
> Although the disks may be fine, once too many are marked as failed, it
> seems you're stuck.
You can get out of this fix, but it'S not pretty: mkraid -force will not
modify any data on the disks except for the raid superblock, so if you use
it to recreate you array you can access your data again.
There's some things you should be careful about if you try this:
* be VERY careful that your raidtab specifies a configuration that
EXACTLY matches the state the array was running in previously (order of
disks, chunksize etc).
* Mark one disk as failed in raidtab. (use "failed-disk" instead of
"raid-disk"). Use the disk taht was kicked from the array first, i.e the
disk least likely to contain good data for this.
The reason for the "failed-disk" entry is that you don't want to have the
background resync/parity generation start when you run mkraid: if there WAS
an error in your raidtab and you can't access the newly created array, you
can try again with a fixed config. without failed-disk the background resync
would definitely fry your data in case of an error.
Bye, Martin
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]