> From: [email protected] [mailto:discuss-
> [email protected]] On Behalf Of Derek Atkins
> 
> > ZFS prevents write holes by enforcing atomicity of all writes to
> > storage. It does this by controlling all of the I/O caching involved in
> > the write process from system RAM down to the write acceleration cache
> > on the disks themselves. ZFS updates the file system only after all
> > cache points have confirmed being flushed.
> >
> > If any of these points lie about their status then write holes can
> > appear under power fault conditions. 

True, but at least, with ZFS & BTRFS, any subsequent read of corrupt data will 
be detected as a result of cksums.

Also, since we're talking about redundant storage, ZFS (and presumably BTRFS, 
cuz it's obvious.) will attempt to correct the error.  If a single disk (or a 
number smaller than your redundancy protection level) wrote corrupt data (or no 
data) then the cksum fails, and the FS will try all possible combinations of 
eliminating devices and re-reading, to identify which device(s) contains 
corrupt data, and if it finds some combination that produces a good cksum, it 
will attempt to re-write the data to whichever disk(s) failed.


> Fair enough...  I don't know if standard (e.g. DM-level) RAID5 or RAID6
> provide for said "scrubbing"?  

Nope.
Scrubbing is only possible thanks to cksum'ing at the raid level.  Without 
that, your raid is dependent on the underlying devices to correctly report 
errors.  But if an error isn't noticed by hardware and escalated to the OS, 
then the error passes standard raid undetected.

How often does that happen?  Well, in my experience, heavy usage on several TB 
of enterprise-sata hardware produces a bit error about once every 1-2 years, as 
identified by the zfs cksum counter incrementing, without the hard drive error 
counter incrementing.  This means the error passed the drive undetected, and 
was identified and corrected by ZFS.

_______________________________________________
Discuss mailing list
[email protected]
http://lists.blu.org/mailman/listinfo/discuss

Reply via email to