Re: Adventures in btrfs raid5 disk recovery

Andrei Borzenkov Fri, 24 Jun 2016 02:53:48 -0700

On Fri, Jun 24, 2016 at 11:50 AM, Hugo Mills <h...@carfax.org.uk> wrote:
> On Fri, Jun 24, 2016 at 07:02:34AM +0300, Andrei Borzenkov wrote:
>> 24.06.2016 04:47, Zygo Blaxell пишет:
>> > On Thu, Jun 23, 2016 at 06:26:22PM -0600, Chris Murphy wrote:
>> >> On Thu, Jun 23, 2016 at 1:32 PM, Goffredo Baroncelli <kreij...@inwind.it> 
>> >> wrote:
>> >>> The raid5 write hole is avoided in BTRFS (and in ZFS) thanks to the 
>> >>> checksum.
>> >>
>> >> Yeah I'm kinda confused on this point.
>> >>
>> >> https://btrfs.wiki.kernel.org/index.php/RAID56
>> >>
>> >> It says there is a write hole for Btrfs. But defines it in terms of
>> >> parity possibly being stale after a crash. I think the term comes not
>> >> from merely parity being wrong but parity being wrong *and* then being
>> >> used to wrongly reconstruct data because it's blindly trusted.
>> >
>> > I think the opposite is more likely, as the layers above raid56
>> > seem to check the data against sums before raid56 ever sees it.
>> > (If those layers seem inverted to you, I agree, but OTOH there are
>> > probably good reason to do it that way).
>> >
>>
>> Yes, that's how I read code as well. btrfs layer that does checksumming
>> is unaware of parity blocks at all; for all practical purposes they do
>> not exist. What happens is approximately
>>
>> 1. logical extent is allocated and checksum computed
>> 2. it is mapped to physical area(s) on disks, skipping over what would
>> be parity blocks
>> 3. when these areas are written out, RAID56 parity is computed and filled in
>>
>> IOW btrfs checksums are for (meta)data and RAID56 parity is not data.
>
>    Checksums are not parity, correct. However, every data block
> (including, I think, the parity) is checksummed and put into the csum
> tree. This allows the FS to determine where damage has occurred,
> rather thansimply detecting that it has occurred (which would be the
> case if the parity doesn't match the data, or if the two copies of a
> RAID-1 array don't match).
>


Yes, that is what I wrote below. But that means that RAID5 with one
degraded disk won't be able to reconstruct data on this degraded disk
because reconstructed extent content won't match checksum. Which kinda
makes RAID5 pointless.

...
>
>> > It looks like uncorrectable failures might occur because parity is
>> > correct, but the parity checksum is out of date, so the parity checksum
>> > doesn't match even though data blindly reconstructed from the parity
>> > *would* match the data.
>> >
>>
>> Yep, that is how I read it too. So if your data is checksummed, it
>> should at least avoid silent corruption.
>>
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Adventures in btrfs raid5 disk recovery

Reply via email to