On 28/06/16 22:25, Austin S. Hemmelgarn wrote: > On 2016-06-28 08:14, Steven Haigh wrote: >> On 28/06/16 22:05, Austin S. Hemmelgarn wrote: >>> On 2016-06-27 17:57, Zygo Blaxell wrote: >>>> On Mon, Jun 27, 2016 at 10:17:04AM -0600, Chris Murphy wrote: >>>>> On Mon, Jun 27, 2016 at 5:21 AM, Austin S. Hemmelgarn >>>>> <ahferro...@gmail.com> wrote: >>>>>> On 2016-06-25 12:44, Chris Murphy wrote: >>>>>>> On Fri, Jun 24, 2016 at 12:19 PM, Austin S. Hemmelgarn >>>>>>> <ahferro...@gmail.com> wrote: >>>>>>> >>>>>>> OK but hold on. During scrub, it should read data, compute checksums >>>>>>> *and* parity, and compare those to what's on-disk - > EXTENT_CSUM in >>>>>>> the checksum tree, and the parity strip in the chunk tree. And if >>>>>>> parity is wrong, then it should be replaced. >>>>>> >>>>>> Except that's horribly inefficient. With limited exceptions >>>>>> involving >>>>>> highly situational co-processors, computing a checksum of a parity >>>>>> block is >>>>>> always going to be faster than computing parity for the stripe. By >>>>>> using >>>>>> that to check parity, we can safely speed up the common case of near >>>>>> zero >>>>>> errors during a scrub by a pretty significant factor. >>>>> >>>>> OK I'm in favor of that. Although somehow md gets away with this by >>>>> computing and checking parity for its scrubs, and still manages to >>>>> keep drives saturated in the process - at least HDDs, I'm not sure how >>>>> it fares on SSDs. >>>> >>>> A modest desktop CPU can compute raid6 parity at 6GB/sec, a less-modest >>>> one at more than 10GB/sec. Maybe a bottleneck is within reach of an >>>> array of SSDs vs. a slow CPU. >>> OK, great for people who are using modern desktop or server CPU's. Not >>> everyone has that luxury, and even on many such CPU's, it's _still_ >>> faster to computer CRC32c checksums. On top of that, we don't appear to >>> be using the in-kernel parity-raid libraries (or if we are, I haven't >>> been able to find where we are calling the functions for it), so we >>> don't necessarily get assembly optimized or co-processor accelerated >>> computation of the parity itself. The other thing that I didn't mention >>> above though, is that computing parity checksums will always take less >>> time than computing parity, because you have to process significantly >>> less data. On a 4 disk RAID5 array, you're processing roughly 2/3 as >>> much data to do the parity checksums instead of parity itself, which >>> means that the parity computation would need to be 200% faster than the >>> CRC32c computation to break even, and this margin gets bigger and bigger >>> as you add more disks. >>> >>> On small arrays, this obviously won't have much impact. Once you start >>> to scale past a few TB though, even a few hundred MB/s faster processing >>> means a significant decrease in processing time. Say you have a CPU >>> which gets about 12.0GB/s for RAID5 parity, and and about 12.25GB/s for >>> CRC32c (~2% is a conservative ratio assuming you use the CRC32c >>> instruction and assembly optimized RAID5 parity computations on a modern >>> x86_64 processor (the ratio on both the mobile Core i5 in my laptop and >>> the Xeon E3 in my home server is closer to 5%)). Assuming those >>> numbers, and that we're already checking checksums on non-parity blocks, >>> processing 120TB of data in a 4 disk array (which gives 40TB of parity >>> data, so 160TB total) gives: >>> For computing the parity to scrub: >>> 120TB / 12.25GB = 9795.9 seconds for processing CRC32c csums of all the >>> regular data >>> 120TB / 12GB = 10000 seconds for processing parity of all stripes >>> = 19795.9 seconds total >>> ~ 5.4 hours total >>> >>> For computing csums of the parity: >>> 120TB / 12.25GB = 9795.9 seconds for processing CRC32c csums of all the >>> regular data >>> 40TB / 12.25GB = 3265.3 seconds for processing CRC32c csums of all the >>> parity data >>> = 13061.2 seconds total >>> ~ 3.6 hours total >>> >>> The checksum based computation is approximately 34% faster than the >>> parity computation. Much of this of course is that you have to process >>> the regular data twice for the parity computation method (once for >>> csums, once for parity). You could probably do one pass computing both >>> values, but that would need to be done carefully; and, without >>> significant optimization, would likely not get you much benefit other >>> than cutting the number of loads in half. >> >> And it all means jack shit because you don't get the data to disk that >> quick. Who cares if its 500% faster - if it still saturates the >> throughput of the actual drives, what difference does it make? > It has less impact on everything else running on the system at the time > because it uses less CPU time and potentially less memory. This is the > exact same reason that you want your RAID parity computation performance > as good as possible, the less time the CPU spends on that, the more it > can spend on other things. On top of that, there are high-end systems > that do have SSD's that can get multiple GB/s of data transfer per > second, and NVDIMM's are starting to become popular in the server > market, and those give you data transfer speeds equivalent to regular > memory bandwidth (which can be well over 20GB/s on decent hardware (I've > got a relatively inexpensive system using DDR3-1866 RAM that has roughly > 22-24GB/s of memory bandwidth)). Looking at this another way, the fact > that the storage device is the bottleneck right now is not a good excuse > to not worry about making everything else as efficient as possible.
If its purely about performance - then start with multi-thread as a base - not chopping features to make better performance. I'm not aware of any modern CPU that comes with a single core these days - so parallel workloads are much more efficient than a single thread. Yes, its a law of diminishing returns - but if you're not doing a full check of data when one would assume you are, then is that broken by design? Personally, during a scrub, I would want to know if either the checksum OR the parity is wrong - as that indicates problems at a much deeper level. As someone who just lost ~4Tb of data due to BTRFS bugs, protection of data trumps performance in most cases. -- Steven Haigh Email: net...@crc.id.au Web: https://www.crc.id.au Phone: (03) 9001 6090 - 0412 935 897
signature.asc
Description: OpenPGP digital signature