The reason you can't find the "missing" blocks in zdb is that there aren't any. This is consistent with your "zpool list" output which shows that no more space is allocated as you create more deduped-away "copies" (ALLOC is constant).
The only thing that changes is the DMU's view of how much free space there is. What's happening is that the SPA tells the DMU that the pool is bigger than it really is due to dedup (USED + AVAIL > SIZE). The SPA also tells the DMU how much "slop space" is unusable. The strange thing is that the "slop space" is based on the dedup-inflated fictional size of the pool. So as you have more dedup copies, the slop space increases. This doesn't seem necessary to me. The fix would be for spa_get_slop_space() to not include ddt_get_dedup_dspace() in its overall size (via spa_get_dspace()). Instead it should just take into account metaslab_class_get_dspace(spa_normal_class(spa)) (and probably the vdev removal stuff from spa_get_dspace()). --matt On Tue, Jun 22, 2021 at 4:46 AM Rich <rincebr...@gmail.com> wrote: > Hi everyone, > i've spent a bit of time poking around in #12255 - the short version > is "user reporting that, on a 120G pool with dedup on, writing > additional copies of a 100G file where part of the first 128k is > different occupies an additional ~3.2G each, until ENOSPC". > > Curiously, on my reproducing setup (git master, 120G file on an SSD > with single file special and dedup vdevs to try and identify where the > space goes, ashift 9, compression=lz4 even though the 100G is from > /dev/urandom, recordsize=1M though that didn't noticably change > matters), zfs list -o space at only one copy reports: > NAME AVAIL USED USEDSNAP USEDDS USEDREFRESERV USEDCHILD > dedup2 15.3G 100G 0B 24K 0B 100G > dedup2/copyme 15.3G 100G 0B 100G 0B 0B > and zpool list -v reports: > NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP > DEDUP HEALTH ALTROOT > dedup2 158G 99.9G 58.1G - - 0% 63% > 1.00x ONLINE - > /deduppool2 119G 99.9G 19.1G - - 0% 83.9% > - ONLINE > dedup - - - - - - - > - - > /dedup2dedup 19.5G 31.3M 19.5G - - 0% 0.15% > - ONLINE > special - - - - - - - > - - > /dedup2special 19.5G 11.9M 19.5G - - 0% 0.05% > - ONLINE > > while at n=2 (I used entirely identical files)... > NAME AVAIL USED USEDSNAP USEDDS USEDREFRESERV USEDCHILD > dedup2 12.1G 200G 0B 24K 0B 200G > dedup2/copyme 12.1G 200G 0B 200G 0B 0B > NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP > DEDUP HEALTH ALTROOT > dedup2 158G 100G 57.9G - - 0% 63% > 2.00x ONLINE - > /deduppool2 119G 100G 19G - - 0% 84.0% > - ONLINE > dedup - - - - - - - > - - > /dedup2dedup 19.5G 43.4M 19.5G - - 0% 0.21% > - ONLINE > special - - - - - - - > - - > /dedup2special 19.5G 22.8M 19.5G - - 0% 0.11% > - ONLINE > > ...and so on until returns ENOSPC at n=6 while zpool list still > reports 19G free: > NAME AVAIL USED USEDSNAP USEDDS USEDREFRESERV USEDCHILD > dedup2 0B 587G 0B 24K 0B 587G > dedup2/copyme 0B 587G 0B 587G 0B 0B > NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP > DEDUP HEALTH ALTROOT > dedup2 158G 100G 57.9G - - 0% 63% > 5.87x ONLINE - > /deduppool2 119G 100G 19G - - 0% 84.0% > - ONLINE > dedup - - - - - - - > - - > /dedup2dedup 19.5G 40.9M 19.5G - - 0% 0.20% > - ONLINE > special - - - - - - - > - - > /dedup2special 19.5G 52.0M 19.4G - - 0% 0.26% > - ONLINE > > My hypothesis was metadata, even though 3% overhead seemed like a lot, > but examining zdb -dddd diff between the n=1 and n=2 cases closely, I > couldn't see any obvious culprit. (and 5 or 6 ds proved too much for > any non-diff utility to handle, while hundreds of MB of diff output > was not particularly readable even colorized). > > zdb -bb also was not informative (I have copies, if anyone wants, but > suffice it to say they just report 99.9G/200G/.../587G in total, with > nothing besides "ZFS plain file" reporting over 100M usage). > > Deleting the files frees the extra space, so it's not lost, it's > just...allocated somewhere I don't readily see a way to report. > > Could someone actually familiar with how the dedup sausage works > glance at this? I'm not averse to diving in and investigating further > myself, but figured I'd at least try asking, since there seem to be a > whole bunch of moving parts, and I'm not entirely sure where to look > for "space zdb doesn't seem to count, but gets freed correctly". > > (This also works with similar overhead on sparse zvols, so it doesn't > seem to be specific to files?) > > Thanks to whoever has any insight, > - Rich > > ------------------------------------------ > openzfs: openzfs-developer > Permalink: > https://openzfs.topicbox.com/groups/developer/Tc83652ec01c42b81-M59c58b0d0aad209372c29c73 > Delivery options: > https://openzfs.topicbox.com/groups/developer/subscription > ------------------------------------------ openzfs: openzfs-developer Permalink: https://openzfs.topicbox.com/groups/developer/Tc83652ec01c42b81-M32a5e18818c339468bb5cc3c Delivery options: https://openzfs.topicbox.com/groups/developer/subscription