Re: [developer] raidz overhead with ashift=12

Matthew Ahrens Fri, 07 Jun 2019 12:47:28 -0700

On Fri, Jun 7, 2019 at 11:06 AM Eric Borisch <[email protected]> wrote:


> On Fri, Jun 7, 2019 at 12:03 PM Matthew Ahrens <[email protected]>
> wrote:
>
>> The spreadsheet shows how much space will be allocated, which is
>> reflected in the zpool `allocated` property.  However, you are looking at
>> the zfs `used` and `referenced` properties.  These properties (as well as
>> `available` and all other zfs (not zpool) accounting values) take into
>> account the expected RAIDZ overhead, which is calculated assuming 128K
>> logical size blocks.  This means that zfs accounting hides the parity (and
>> padding) overhead when the block size is around 128K.  Other block sizes
>> may see (typically only slightly) more or less space consumed than expected
>> (e.g. if the `recordsize` property has been changed, a 1GB file may have
>> zfs `used` of 0.9G, or 1.1G).
>>
>> As indicated in cell F23, the expected overhead for 4K-sector 8-wide
>> RAIDZ2 is 41% (which is around what the RAID5 overhead would be, 2/6 =
>> 33%).  This is taken into account in the "RAID-Z deflation ratio"
>> (`vdev_deflate_ratio`).  In other words, `used = allocated / 1.41`.  If we
>> undo that, we get `21.4G * 1.41 = 30.2G`, which is around what we expected.
>>
>
> Aha! I've often wondered why I couldn't quite get some values to quite
> line up with what I understood to be occurring on disk. Looks like a
> potential area for improvement;
>

 I agree this is confusing and it's an area we should try to improve!


> for ZVOLs, wouldn't this calculation be better served by considering
> the volblocksize (and associated overhead) of each volume? The 'typically
> only slightly' changes to 'wildly differs' with RAIDZ2/3 and small
> volblocksizes.)
>

I get the idea there, but it isn't very straightforward, because the
deflate ratio is specific to the RAIDZ vdev, which can store different
zvols (with different volblocksizes) and filesystems.

This is intentional - if a zvol that's "using" 1TB has actually allocated
more space than a filesystem that's also "using" 1TB, that would be even
more confusing than the current situation.

If we took into account the additional overhead when calculating each
dataset's "available",  then the confusions would be:
- different datasets have different amounts "available" even if they don't
have quotas/reservations
- a zvol could have 1TB "available", but after writing 1TB it is using an
additional 1.5TB.  So the space "used" and "available" is actually in
different units, and adding up used+available wouldn't make any sense.
Which are less confusing than if it changed the "used", but probably still
not worth it in my opinion.

So I think we might need to do some more brainstorming to come up with
something that is a net improvement on the current situation.

--matt

------------------------------------------
openzfs: openzfs-developer
Permalink: 
https://openzfs.topicbox.com/groups/developer/Tf89af487ee658da3-M4ff86752be304fa3bdf19752
Delivery options: https://openzfs.topicbox.com/groups/developer/subscription

Re: [developer] raidz overhead with ashift=12

Reply via email to