Re: Adventures in btrfs raid5 disk recovery

Steven Haigh Tue, 28 Jun 2016 05:15:56 -0700

On 28/06/16 22:05, Austin S. Hemmelgarn wrote:
> On 2016-06-27 17:57, Zygo Blaxell wrote:
>> On Mon, Jun 27, 2016 at 10:17:04AM -0600, Chris Murphy wrote:
>>> On Mon, Jun 27, 2016 at 5:21 AM, Austin S. Hemmelgarn
>>> <ahferro...@gmail.com> wrote:
>>>> On 2016-06-25 12:44, Chris Murphy wrote:
>>>>> On Fri, Jun 24, 2016 at 12:19 PM, Austin S. Hemmelgarn
>>>>> <ahferro...@gmail.com> wrote:
>>>>>
>>>>> OK but hold on. During scrub, it should read data, compute checksums
>>>>> *and* parity, and compare those to what's on-disk - > EXTENT_CSUM in
>>>>> the checksum tree, and the parity strip in the chunk tree. And if
>>>>> parity is wrong, then it should be replaced.
>>>>
>>>> Except that's horribly inefficient.  With limited exceptions involving
>>>> highly situational co-processors, computing a checksum of a parity
>>>> block is
>>>> always going to be faster than computing parity for the stripe.  By
>>>> using
>>>> that to check parity, we can safely speed up the common case of near
>>>> zero
>>>> errors during a scrub by a pretty significant factor.
>>>
>>> OK I'm in favor of that. Although somehow md gets away with this by
>>> computing and checking parity for its scrubs, and still manages to
>>> keep drives saturated in the process - at least HDDs, I'm not sure how
>>> it fares on SSDs.
>>
>> A modest desktop CPU can compute raid6 parity at 6GB/sec, a less-modest
>> one at more than 10GB/sec.  Maybe a bottleneck is within reach of an
>> array of SSDs vs. a slow CPU.
> OK, great for people who are using modern desktop or server CPU's.  Not
> everyone has that luxury, and even on many such CPU's, it's _still_
> faster to computer CRC32c checksums.  On top of that, we don't appear to
> be using the in-kernel parity-raid libraries (or if we are, I haven't
> been able to find where we are calling the functions for it), so we
> don't necessarily get assembly optimized or co-processor accelerated
> computation of the parity itself.  The other thing that I didn't mention
> above though, is that computing parity checksums will always take less
> time than computing parity, because you have to process significantly
> less data.  On a 4 disk RAID5 array, you're processing roughly 2/3 as
> much data to do the parity checksums instead of parity itself, which
> means that the parity computation would need to be 200% faster than the
> CRC32c computation to break even, and this margin gets bigger and bigger
> as you add more disks.
> 
> On small arrays, this obviously won't have much impact.  Once you start
> to scale past a few TB though, even a few hundred MB/s faster processing
> means a significant decrease in processing time.  Say you have a CPU
> which gets about 12.0GB/s for RAID5 parity, and and about 12.25GB/s for
> CRC32c (~2% is a conservative ratio assuming you use the CRC32c
> instruction and assembly optimized RAID5 parity computations on a modern
> x86_64 processor (the ratio on both the mobile Core i5 in my laptop and
> the Xeon E3 in my home server is closer to 5%)).  Assuming those
> numbers, and that we're already checking checksums on non-parity blocks,
> processing 120TB of data in a 4 disk array (which gives 40TB of parity
> data, so 160TB total) gives:
> For computing the parity to scrub:
> 120TB / 12.25GB =  9795.9 seconds for processing CRC32c csums of all the
> regular data
> 120TB / 12GB    = 10000 seconds for processing parity of all stripes
>                 = 19795.9 seconds total
>                 ~ 5.4 hours total
> 
> For computing csums of the parity:
> 120TB / 12.25GB =  9795.9 seconds for processing CRC32c csums of all the
> regular data
> 40TB / 12.25GB  =  3265.3 seconds for processing CRC32c csums of all the
> parity data
>                 = 13061.2 seconds total
>                 ~ 3.6 hours total
> 
> The checksum based computation is approximately 34% faster than the
> parity computation.  Much of this of course is that you have to process
> the regular data twice for the parity computation method (once for
> csums, once for parity).  You could probably do one pass computing both
> values, but that would need to be done carefully; and, without
> significant optimization, would likely not get you much benefit other
> than cutting the number of loads in half.


And it all means jack shit because you don't get the data to disk that
quick. Who cares if its 500% faster - if it still saturates the
throughput of the actual drives, what difference does it make?

I'm all for actual solutions, but the nirvana fallacy seems to apply here...

-- 
Steven Haigh

Email: net...@crc.id.au
Web: https://www.crc.id.au
Phone: (03) 9001 6090 - 0412 935 897

signature.asc
Description: OpenPGP digital signature

Re: Adventures in btrfs raid5 disk recovery

Reply via email to