Hi everyone,
i've spent a bit of time poking around in #12255 - the short version
is "user reporting that, on a 120G pool with dedup on, writing
additional copies of a 100G file where part of the first 128k is
different occupies an additional ~3.2G each, until ENOSPC".

Curiously, on my reproducing setup (git master, 120G file on an SSD
with single file special and dedup vdevs to try and identify where the
space goes, ashift 9, compression=lz4 even though the 100G is from
/dev/urandom, recordsize=1M though that didn't noticably change
matters), zfs list -o space at only one copy reports:
NAME           AVAIL   USED  USEDSNAP  USEDDS  USEDREFRESERV  USEDCHILD
dedup2         15.3G   100G        0B     24K             0B       100G
dedup2/copyme  15.3G   100G        0B    100G             0B         0B
and zpool list -v reports:
NAME               SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP
 DEDUP    HEALTH  ALTROOT
dedup2             158G  99.9G  58.1G        -         -     0%    63%
 1.00x    ONLINE  -
  /deduppool2      119G  99.9G  19.1G        -         -     0%  83.9%
     -    ONLINE
dedup                 -      -      -        -         -      -      -      -  -
  /dedup2dedup    19.5G  31.3M  19.5G        -         -     0%  0.15%
     -    ONLINE
special               -      -      -        -         -      -      -      -  -
  /dedup2special  19.5G  11.9M  19.5G        -         -     0%  0.05%
     -    ONLINE

while at n=2 (I used entirely identical files)...
NAME           AVAIL   USED  USEDSNAP  USEDDS  USEDREFRESERV  USEDCHILD
dedup2         12.1G   200G        0B     24K             0B       200G
dedup2/copyme  12.1G   200G        0B    200G             0B         0B
NAME               SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP
 DEDUP    HEALTH  ALTROOT
dedup2             158G   100G  57.9G        -         -     0%    63%
 2.00x    ONLINE  -
  /deduppool2      119G   100G    19G        -         -     0%  84.0%
     -    ONLINE
dedup                 -      -      -        -         -      -      -      -  -
  /dedup2dedup    19.5G  43.4M  19.5G        -         -     0%  0.21%
     -    ONLINE
special               -      -      -        -         -      -      -      -  -
  /dedup2special  19.5G  22.8M  19.5G        -         -     0%  0.11%
     -    ONLINE

...and so on until returns ENOSPC at n=6 while zpool list still
reports 19G free:
NAME           AVAIL   USED  USEDSNAP  USEDDS  USEDREFRESERV  USEDCHILD
dedup2            0B   587G        0B     24K             0B       587G
dedup2/copyme     0B   587G        0B    587G             0B         0B
NAME               SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP
 DEDUP    HEALTH  ALTROOT
dedup2             158G   100G  57.9G        -         -     0%    63%
 5.87x    ONLINE  -
  /deduppool2      119G   100G    19G        -         -     0%  84.0%
     -    ONLINE
dedup                 -      -      -        -         -      -      -      -  -
  /dedup2dedup    19.5G  40.9M  19.5G        -         -     0%  0.20%
     -    ONLINE
special               -      -      -        -         -      -      -      -  -
  /dedup2special  19.5G  52.0M  19.4G        -         -     0%  0.26%
     -    ONLINE

My hypothesis was metadata, even though 3% overhead seemed like a lot,
but examining zdb -dddd diff between the n=1 and n=2 cases closely, I
couldn't see any obvious culprit. (and 5 or 6 ds proved too much for
any non-diff utility to handle, while hundreds of MB of diff output
was not particularly readable even colorized).

zdb -bb also was not informative (I have copies, if anyone wants, but
suffice it to say they just report 99.9G/200G/.../587G in total, with
nothing besides "ZFS plain file" reporting over 100M usage).

Deleting the files frees the extra space, so it's not lost, it's
just...allocated somewhere I don't readily see a way to report.

Could someone actually familiar with how the dedup sausage works
glance at this? I'm not averse to diving in and investigating further
myself, but figured I'd at least try asking, since there seem to be a
whole bunch of moving parts, and I'm not entirely sure where to look
for "space zdb doesn't seem to count, but gets freed correctly".

(This also works with similar overhead on sparse zvols, so it doesn't
seem to be specific to files?)

Thanks to whoever has any insight,
- Rich

------------------------------------------
openzfs: openzfs-developer
Permalink: 
https://openzfs.topicbox.com/groups/developer/Tc83652ec01c42b81-M59c58b0d0aad209372c29c73
Delivery options: https://openzfs.topicbox.com/groups/developer/subscription

Reply via email to