Hi Neil,

I've been following this thread with interest and I have a few questions.

Neil Brown wrote:
On Monday March 5, [EMAIL PROTECTED] wrote:

Neil Brown wrote:

When a disk fails we know what to rewrite, but when we discover a mismatch
we do not have this knowledge. It may corrupt the good copy of a raid1.

If a block differs between the different drives in a raid1, then no
copy is 'good'.  It is possible that one copy is the one you think you
want, but you probably wouldn't know by looking at it.
The worst situation is the have inconsistent data. If you read and get
one value, then later read and get another value, that is really bad.

For raid1 we 'fix' and inconsistency by arbitrarily choosing one copy
and writing it over all other copies.
For raid5 we assume the data is correct and update the parity.

Wouldn't it be better to signal an error rather than potentially
corrupt data - or perhaps this already happens? Does the above only
refer to a 'repair' action?

I'm worrying here about silent data corruption that gets on to my
backup tapes. If an error was (is?) signaled by the raid system
during the backup and could be tracked to the file being copied at
the time, it would allow recovery of the data from a prior
backup. If raid remains silent, the corrupted data eventually
gets copied onto my entire backup rotation. Can you comment on this?

FWIW, my 600GB raid5 array shows mismatch_cnt of 24 when I 'check'
it - that machine has hung up on occasion.

Cheers,
Paul
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to