Guy wrote:

> It is not just a parity issue.  If you have a 4 disk RAID 5, you can't be
> sure which if any have written the stripe.  Maybe the parity was updated,
> but nothing else.  Maybe the parity and 2 data disks, leaving 1 data disk
> with old data.
> 
> Beyond that, md does write caching.  I don't think the file system can tell
> when a write is truly complete.  I don't recall ever having a Linux system
> crash, so I am not worried.  But power failures cause the same risk, or
> maybe more.  I have seen power failures, even with a UPS!

Good points there Guy - I do like your example. I'll go further with
crashing too and say that I actually crash outright occasionally.
Usually when building out new machines where I don't know the proper
driver tweaks, or failing hardware, but it happens without power loss.
Its important to get this correct and well understood.

That said, unless I hear otherwise from someone that works in the code,
I think md won't report the write as complete to upper layers until it
actually is. I don't believe it does write-caching, and regardless, if
it does it must not do it until some durable representation of the data
is committed to hardware and the parity stays dirty until redundancy is
committed.

Building on that, barring hardware write-caching, I think with a
journalling FS like ext3 and md only reporting the write complete when
it really is, things won't be trusted at the FS level unless they're
durably written to hardware.

I think that's sufficient to prove consistency across crashes.

For example, even if you crash during an update to a file smaller than a
stripe, the stripe will be dirty so the bad parity will be discarded and
the filesystem won't trust the blocks that didn't get reported back as
written by md. So that file update is lost, but the FS is consistent and
all the data it can reach is consistent with what it thinks is there.

So, I continue to believe silent corruption is mythical. I'm still open
to good explanation it's not though.

-Mike
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to