The reason you can't find the "missing" blocks in zdb is that there aren't
any.  This is consistent with your "zpool list" output which shows that no
more space is allocated as you create more deduped-away "copies" (ALLOC is
constant).

The only thing that changes is the DMU's view of how much free space there
is.  What's happening is that the SPA tells the DMU that the pool is bigger
than it really is due to dedup (USED + AVAIL > SIZE).  The SPA also tells
the DMU how much "slop space" is unusable.  The strange thing is that the
"slop space" is based on the dedup-inflated fictional size of the pool.  So
as you have more dedup copies, the slop space increases.

This doesn't seem necessary to me.  The fix would be for
spa_get_slop_space() to not include ddt_get_dedup_dspace() in its overall
size (via spa_get_dspace()).  Instead it should just take into account
metaslab_class_get_dspace(spa_normal_class(spa)) (and probably the vdev
removal stuff from spa_get_dspace()).

--matt

On Tue, Jun 22, 2021 at 4:46 AM Rich <rincebr...@gmail.com> wrote:

> Hi everyone,
> i've spent a bit of time poking around in #12255 - the short version
> is "user reporting that, on a 120G pool with dedup on, writing
> additional copies of a 100G file where part of the first 128k is
> different occupies an additional ~3.2G each, until ENOSPC".
>
> Curiously, on my reproducing setup (git master, 120G file on an SSD
> with single file special and dedup vdevs to try and identify where the
> space goes, ashift 9, compression=lz4 even though the 100G is from
> /dev/urandom, recordsize=1M though that didn't noticably change
> matters), zfs list -o space at only one copy reports:
> NAME           AVAIL   USED  USEDSNAP  USEDDS  USEDREFRESERV  USEDCHILD
> dedup2         15.3G   100G        0B     24K             0B       100G
> dedup2/copyme  15.3G   100G        0B    100G             0B         0B
> and zpool list -v reports:
> NAME               SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP
>  DEDUP    HEALTH  ALTROOT
> dedup2             158G  99.9G  58.1G        -         -     0%    63%
>  1.00x    ONLINE  -
>   /deduppool2      119G  99.9G  19.1G        -         -     0%  83.9%
>      -    ONLINE
> dedup                 -      -      -        -         -      -      -
>   -  -
>   /dedup2dedup    19.5G  31.3M  19.5G        -         -     0%  0.15%
>      -    ONLINE
> special               -      -      -        -         -      -      -
>   -  -
>   /dedup2special  19.5G  11.9M  19.5G        -         -     0%  0.05%
>      -    ONLINE
>
> while at n=2 (I used entirely identical files)...
> NAME           AVAIL   USED  USEDSNAP  USEDDS  USEDREFRESERV  USEDCHILD
> dedup2         12.1G   200G        0B     24K             0B       200G
> dedup2/copyme  12.1G   200G        0B    200G             0B         0B
> NAME               SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP
>  DEDUP    HEALTH  ALTROOT
> dedup2             158G   100G  57.9G        -         -     0%    63%
>  2.00x    ONLINE  -
>   /deduppool2      119G   100G    19G        -         -     0%  84.0%
>      -    ONLINE
> dedup                 -      -      -        -         -      -      -
>   -  -
>   /dedup2dedup    19.5G  43.4M  19.5G        -         -     0%  0.21%
>      -    ONLINE
> special               -      -      -        -         -      -      -
>   -  -
>   /dedup2special  19.5G  22.8M  19.5G        -         -     0%  0.11%
>      -    ONLINE
>
> ...and so on until returns ENOSPC at n=6 while zpool list still
> reports 19G free:
> NAME           AVAIL   USED  USEDSNAP  USEDDS  USEDREFRESERV  USEDCHILD
> dedup2            0B   587G        0B     24K             0B       587G
> dedup2/copyme     0B   587G        0B    587G             0B         0B
> NAME               SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP
>  DEDUP    HEALTH  ALTROOT
> dedup2             158G   100G  57.9G        -         -     0%    63%
>  5.87x    ONLINE  -
>   /deduppool2      119G   100G    19G        -         -     0%  84.0%
>      -    ONLINE
> dedup                 -      -      -        -         -      -      -
>   -  -
>   /dedup2dedup    19.5G  40.9M  19.5G        -         -     0%  0.20%
>      -    ONLINE
> special               -      -      -        -         -      -      -
>   -  -
>   /dedup2special  19.5G  52.0M  19.4G        -         -     0%  0.26%
>      -    ONLINE
>
> My hypothesis was metadata, even though 3% overhead seemed like a lot,
> but examining zdb -dddd diff between the n=1 and n=2 cases closely, I
> couldn't see any obvious culprit. (and 5 or 6 ds proved too much for
> any non-diff utility to handle, while hundreds of MB of diff output
> was not particularly readable even colorized).
>
> zdb -bb also was not informative (I have copies, if anyone wants, but
> suffice it to say they just report 99.9G/200G/.../587G in total, with
> nothing besides "ZFS plain file" reporting over 100M usage).
>
> Deleting the files frees the extra space, so it's not lost, it's
> just...allocated somewhere I don't readily see a way to report.
>
> Could someone actually familiar with how the dedup sausage works
> glance at this? I'm not averse to diving in and investigating further
> myself, but figured I'd at least try asking, since there seem to be a
> whole bunch of moving parts, and I'm not entirely sure where to look
> for "space zdb doesn't seem to count, but gets freed correctly".
>
> (This also works with similar overhead on sparse zvols, so it doesn't
> seem to be specific to files?)
>
> Thanks to whoever has any insight,
> - Rich
>
> ------------------------------------------
> openzfs: openzfs-developer
> Permalink:
> https://openzfs.topicbox.com/groups/developer/Tc83652ec01c42b81-M59c58b0d0aad209372c29c73
> Delivery options:
> https://openzfs.topicbox.com/groups/developer/subscription
>

------------------------------------------
openzfs: openzfs-developer
Permalink: 
https://openzfs.topicbox.com/groups/developer/Tc83652ec01c42b81-M32a5e18818c339468bb5cc3c
Delivery options: https://openzfs.topicbox.com/groups/developer/subscription

Reply via email to