Re: ditto blocks on ZFS

Konstantinos Skarlatos Wed, 21 May 2014 16:30:21 -0700

On 20/5/2014 5:07 πμ, Russell Coker wrote:

On Mon, 19 May 2014 23:47:37 Brendan Hide wrote:

This is extremely difficult to measure objectively. Subjectively ... see
below.

[snip]

*What other failure modes* should we guard against?

I know I'd sleep a /little/ better at night knowing that a double disk
failure on a "raid5/1/10" configuration might ruin a ton of data along
with an obscure set of metadata in some "long" tree paths - but not the
entire filesystem.

My experience is that most disk failures that don't involve extreme physical
damage (EG dropping a drive on concrete) don't involve totally losing the
disk.  Much of the discussion about RAID failures concerns entirely failed
disks, but I believe that is due to RAID implementations such as Linux
software RAID that will entirely remove a disk when it gives errors.

I have a disk which had ~14,000 errors of which ~2000 errors were corrected by
duplicate metadata.  If two disks with that problem were in a RAID-1 array
then duplicate metadata would be a significant benefit.

The other use-case/failure mode - where you are somehow unlucky enough
to have sets of bad sectors/bitrot on multiple disks that simultaneously
affect the only copies of the tree roots - is an extremely unlikely
scenario. As unlikely as it may be, the scenario is a very painful
consequence in spite of VERY little corruption. That is where the
peace-of-mind/bragging rights come in.

http://research.cs.wisc.edu/adsl/Publications/corruption-fast08.html

The NetApp research on latent errors on drives is worth reading.  On page 12
they report latent sector errors on 9.5% of SATA disks per year.  So if you
lose one disk entirely the risk of having errors on a second disk is higher
than you would want for RAID-5.  While losing the root of the tree is
unlikely, losing a directory in the middle that has lots of subdirectories is
a risk.

Seeing the results of that paper, I think erasure coding is a bettersolution. Instead of having many copies of metadata or data, we could doerasure coding using something like zfec[1] that is being used byTahoe-LAFS, increasing their size by lets say 5-10%, and be quite safeeven from multiple continuous bad sectors.


[1] https://pypi.python.org/pypi/zfec


I can understand why people wouldn't want ditto blocks to be mandatory.  But
why are people arguing against them as an option?


As an aside, I'd really like to be able to set RAID levels by subtree.  I'd
like to use RAID-1 with ditto blocks for my important data and RAID-0 for
unimportant data.


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: ditto blocks on ZFS

Reply via email to