Re: btrfs check inconsistency with raid1, part 1

Austin S. Hemmelgarn Tue, 22 Dec 2015 07:45:26 -0800

On 2015-12-22 05:23, Duncan wrote:

Kai Krakow posted on Tue, 22 Dec 2015 02:48:04 +0100 as excerpted:

I just wondered if btrfs allows for the case where both stripes could
have valid checksums despite of btrfs-RAID - just because a failure
occurred right on the spot.

Is this possible? What happens then? If yes, it would mean not to
blindly trust the RAID without doing the homeworks.


The one case where btrfs could get things wrong that I know of is as I
discovered in my initial pre-btrfs-raid1-deployment testing...

I've had exactly one case where I got _really_ unlucky and had a bunchof media errors on a BTRFS raid1 setup that happened to result insomething similar to this. Things happened such that one copy of ablock (we'll call this one copy 1) had correct data, and the other(we'll call this one copy 2) had incorrect data, except that one copy ofthe metadata had the correct checksum for copy 2, and the other metadatacopy had a correct checksum for copy 1, but, due to a hash collision,the checksum for the metadata block was correct for both copies. As aresult of this, I ended up getting a read-error about 25% of the time(which then forced a re-read of the data, the correct data about 37.5%of the time, and incorrect data the remaining 37.5% of the time. Iactually ran the numbers on how likely this was to happen (more than adozen errors on different disks in blocks that happened to referenceeach other, and a hash collision involving a 4 byte difference betweentwo 16k blocks of data), and it's a statistical impossibility (It's morelikely that one of Amazon or Google's data-centers goes offline due tohardware failures than it is that this will happen again). Obviously itdid happen, but I would say it's such a unrealistic edge case that youprobably don't need to worry about it (although I learned _a lot_ aboutthe internals of BTRFS in trying to figure out what was going on).

[...snip...]


 From all I know and from everything others told me when I asked at the
time, which copy you get then is entirely unpredictable, and worse yet,
you might get btrfs acting on divergent metadata when writing to the
other device.

This is indeed the case. Because of how BTRFS verifies checksums,there's a roughly 50% chance that the first read attempt will result inpicking a mismatched checksum and data block, which will trigger are-read which has an independent 50% chance of again picking a mismatch,resulting in a 25% chance that any read that actually goes to the devicereturns a read error. The remaining 75% of the time, you'll get eitherone block or the other. These numbers of course get skewed by the VFScache. In my case above, the file that was affected was one that isalmost never in cache when it gets accessed, so I saw numbers relativelyclose to what you would get without the cache.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs check inconsistency with raid1, part 1

Reply via email to