Re: [developer] raidz overhead with ashift=12

Richard Elling Fri, 07 Jun 2019 07:13:20 -0700

> On Jun 6, 2019, at 10:54 PM, Mike Gerdts <[email protected]> wrote:
> 
> I'm motivated to make zfs set refreservation=auto do the right thing in the 
> face of raidz and 4k physical blocks, but have data points that provide 
> inconsistent data.  Experimentation shows raidz2 parity overhead that matches 
> my expectations for raidz1.
> 
> Let's consider the case of a pool with 8 disks in one raidz2 vdev, ashift=12.
> 
> In the spreadsheet 
> <https://docs.google.com/spreadsheets/d/1tf4qx1aMJp8Lo_R6gpT689wTjHv6CGVElrPqTA0w_ZY/edit?pli=1#gid=930519344>
>  from Matt's How I Learned to Stop Worrying and Love RAIDZ 
> <https://www.delphix.com/blog/delphix-engineering/zfs-raidz-stripe-width-or-how-i-learned-stop-worrying-and-love-raidz>
>  blog entry, the "RAIDZ2 parity cost" sheet cells F4 and F5 suggest the 
> parity and padding cost is 200%.  That is, a 10 gig zvol with volblocksize=4k 
> or 8k should both end up taking up 30 gig of space.
> 
> Experimentation tells me that they each use just a little bit more than 
> double the amount that was calculated by refreservation=auto.  In each of 
> these cases, compression=off and I've overwritten them with `dd if=/dev/zero 
> ...`


IIRC, the skip blocks are accounted in the pool's "alloc", but not in the 
dataset's
"used"
 -- richard

> 
> $ zfs get 
> used,referenced,logicalused,logicalreferenced,volblocksize,refreservation 
> zones/mg/disk0
> NAME            PROPERTY           VALUE      SOURCE
> zones/mg/disk0  used               21.4G      -
> zones/mg/disk0  referenced         21.4G      -
> zones/mg/disk0  logicalused        10.0G      -
> zones/mg/disk0  logicalreferenced  10.0G      -
> zones/mg/disk0  volblocksize       8K         default
> zones/mg/disk0  refreservation     10.3G      local
> $ zfs get 
> used,referenced,logicalused,logicalreferenced,volblocksize,refreservation 
> zones/mg/disk1
> NAME            PROPERTY           VALUE      SOURCE
> zones/mg/disk1  used               21.4G      -
> zones/mg/disk1  referenced         21.4G      -
> zones/mg/disk1  logicalused        10.0G      -
> zones/mg/disk1  logicalreferenced  10.0G      -
> zones/mg/disk1  volblocksize       4K         -
> zones/mg/disk1  refreservation     10.6G      local
> $ zpool status zones
>   pool: zones
>  state: ONLINE
>   scan: none requested
> config:
> 
>         NAME                       STATE     READ WRITE CKSUM
>         zones                      ONLINE       0     0     0
>           raidz2-0                 ONLINE       0     0     0
>             c0t55CD2E404C314E1Ed0  ONLINE       0     0     0
>             c0t55CD2E404C314E85d0  ONLINE       0     0     0
>             c0t55CD2E404C315450d0  ONLINE       0     0     0
>             c0t55CD2E404C31554Ad0  ONLINE       0     0     0
>             c0t55CD2E404C315BB6d0  ONLINE       0     0     0
>             c0t55CD2E404C315BCDd0  ONLINE       0     0     0
>             c0t55CD2E404C315BFDd0  ONLINE       0     0     0
>             c0t55CD2E404C317724d0  ONLINE       0     0     0
> # echo ::spa -c | mdb -k | grep ashift | sort -u
>             ashift=000000000000000c
> 
> Overwriting from /dev/urandom didn't change the above numbers in any 
> significant way.
> 
> My understanding is that each volblocksize block has data and parity spread 
> across a minimum of 3 devices so that any two could be lost and still 
> recover.  Considering the simple case of volblocksize=4k and ashift=12, 200% 
> overhead for parity (+ no pad) seems spot-on.  I seem to be only seeing 100% 
> overhead for parity plus a little for metadata and its parity.
> 
> What fundamental concept am I missing?
> 
> TIA,
> Mike
> openzfs <https://openzfs.topicbox.com/latest> / openzfs-developer / see 
> discussions <https://openzfs.topicbox.com/groups/developer> + participants 
> <https://openzfs.topicbox.com/groups/developer/members> + delivery options 
> <https://openzfs.topicbox.com/groups/developer/subscription>Permalink 
> <https://openzfs.topicbox.com/groups/developer/Tf89af487ee658da3-M69692d66e50c86b1c23d0e6d>

------------------------------------------
openzfs: openzfs-developer
Permalink: 
https://openzfs.topicbox.com/groups/developer/Tf89af487ee658da3-M96e6a1eff39c33b730555c19
Delivery options: https://openzfs.topicbox.com/groups/developer/subscription

Re: [developer] raidz overhead with ashift=12

Reply via email to