On Wednesday May 16, [EMAIL PROTECTED] wrote:
> Hi all,
> 
> I am running software RAID on Linux 2.6.21.
> 
> While experimenting with adding and removing devices from the RAID array, I
> noticed something very troubling. I have a bad drive (let's call it drive B)
> which gets random read errors. I also have a good drive, call it drive A.
> 
> B can synchronize with A. But then, if I remove A from the raid array, A
> cannot be re-added. This is because the bad drive, B, cannot be read from.

Well, if you remove A, then you have the situation of trusting your
data to a single drive.  And we all know that isn't very safe.

> 
> I wonder if anyone else is interested in a "paranoid recovery" mode where the
> md layer tests the data that has been written. Even if this doubles the
> recovery time, I think that it would be desirable for many applications.

I think it is just as easy to run a 'check' pass after recovery has
completed.

Whenever you are commissioning new hardware, it makes sense to do some
testing before you trust it with valuable data.
It sounds like this particular hardware failure lends itself to easy
discovery with a bit of testing - indeed, you found it while testing
your hardware.  So I don't think it is a failure mode we need to put
any extra care in to.
It is the failure modes that are hard to find with basic testing that
we should worry about.

NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to