> From: Bill Bogstad [mailto:[email protected]]
> 
> > Truth is:  Hardware mirroring doesn't provide data integrity.  But software
> mirroring with btrfs/zfs do indeed provide data integrity.
> 
> For purposes of this email:
> 
> data loss: you don't get any data
> data integrity: you get data, but it isn't what you wrote to the storage 
> system
> 
> Mirroring will help prevent data loss, but not help with data integrity.
> (Unless you read both copies and compare at which point you
> have converted a data integrity event into a possible data loss event).

Seriously dude?

The way ZFS and BTRFS behave is as follows:  The filesystem is aware of 
multiple redundant copies of data.  In normal operation, the filesystem tries 
to read non-overlapping blocks from multiple devices in parallel to increase 
performance.  All data is checksummed, so if any of the data read is in fact 
corrupt, the corruption is detected.  If corruption is detected on any device, 
then the filesystem reads from alternate redundant copies to retrieve valid 
data, and it will rewrite the failed device with valid data, increase the cksum 
failure count for the device, and if the device reports too many cksum 
failures, then the device is marked bad.

So indeed, software mirroring with btrfs and zfs *does* provide data integrity.

Hardware raid, on the other hand, doesn't do checksumming.  It just does device 
block mapping.  When the OS requests a particular block of data, there is no 
way to know which individual device in the raid array actually served up the 
data.  If there exists corrupt data inside a hardware raid set, then it's 
possible to keep reading the same block over and over again, getting different 
data each time.

In software (btrfs and zfs) you should periodically scrub.  In fact, this is 
something that would be good on *all* raid sets, it's just not available on 
hardware raid.  What a scrub does is this:  It reads all redundant copies, of 
all data, on all devices.  Searching for cksum failures anywhere in the 
storage.  Attempts to correct them as described above.

If you don't do a scrub, it's possible for one side of a mirror to have corrupt 
data, silently.  By bad luck, you always read from the good device whenever you 
read that corrupt block, so you never detect the corrupted device.  And then, 
by bad luck, the good side of the mirror suffers hardware failure.  So only 
after the hardware failure, do you actually read the corrupt data and discover 
the corruption.  This could have been avoided, if you performed a scrub while 
there still existed any redundant copy of good data.
_______________________________________________
Discuss mailing list
[email protected]
http://lists.blu.org/mailman/listinfo/discuss

Reply via email to