On Fri, Mar 19, 2021 at 01:59:02PM -0400, Josef Bacik wrote: > On 3/19/21 6:48 AM, Johannes Thumshirn wrote: > > When a file gets deleted on a zoned file system, the space freed is not > > returned back into the block group's free space, but is migrated to > > zone_unusable. > > > > As this zone_unusable space is behind the current write pointer it is not > > possible to use it for new allocations. In the current implementation a > > zone is reset once all of the block group's space is accounted as zone > > unusable. > > > > This behaviour can lead to premature ENOSPC errors on a busy file system. > > > > Instead of only reclaiming the zone once it is completely unusable, > > kick off a reclaim job once the amount of unusable bytes exceeds a user > > configurable threshold between 51% and 100%. It can be set per mounted > > filesystem via the sysfs tunable bg_reclaim_threshold which is set to 75% > > per default. > > > > Similar to reclaiming unused block groups, these dirty block groups are > > added to a to_reclaim list and then on a transaction commit, the reclaim > > process is triggered but after we deleted unused block groups, which will > > free space for the relocation process. > > > > Signed-off-by: Johannes Thumshirn <johannes.thumsh...@wdc.com> > > --- > > > > AFAICT sysfs_create_files() does not have the ability to provide a > > is_visible > > callback, so the bg_reclaim_threshold sysfs file is visible for non zoned > > filesystems as well, even though only for zoned filesystems we're adding > > block > > groups to the reclaim list. I'm all ears for a approach that is sensible in > > this regard. > > > > > > fs/btrfs/block-group.c | 84 ++++++++++++++++++++++++++++++++++++ > > fs/btrfs/block-group.h | 2 + > > fs/btrfs/ctree.h | 3 ++ > > fs/btrfs/disk-io.c | 11 +++++ > > fs/btrfs/free-space-cache.c | 9 +++- > > fs/btrfs/sysfs.c | 35 +++++++++++++++ > > fs/btrfs/volumes.c | 2 +- > > fs/btrfs/volumes.h | 1 + > > include/trace/events/btrfs.h | 12 ++++++ > > 9 files changed, 157 insertions(+), 2 deletions(-) > > > > diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c > > index 9ae3ac96a521..af9026795ddd 100644 > > --- a/fs/btrfs/block-group.c > > +++ b/fs/btrfs/block-group.c > > @@ -1485,6 +1485,80 @@ void btrfs_mark_bg_unused(struct btrfs_block_group > > *bg) > > spin_unlock(&fs_info->unused_bgs_lock); > > } > > +void btrfs_reclaim_bgs(struct btrfs_fs_info *fs_info) > > +{ > > + struct btrfs_block_group *bg; > > + struct btrfs_space_info *space_info; > > + int ret = 0; > > + > > + if (!test_bit(BTRFS_FS_OPEN, &fs_info->flags)) > > + return; > > + > > + if (!btrfs_exclop_start(fs_info, BTRFS_EXCLOP_BALANCE)) > > + return; > > + > > + mutex_lock(&fs_info->reclaim_bgs_lock); > > + while (!list_empty(&fs_info->reclaim_bgs)) { > > + bg = list_first_entry(&fs_info->reclaim_bgs, > > + struct btrfs_block_group, > > + bg_list); > > + list_del_init(&bg->bg_list); > > + > > + space_info = bg->space_info; > > + mutex_unlock(&fs_info->reclaim_bgs_lock); > > + > > + /* Don't want to race with allocators so take the groups_sem */ > > + down_write(&space_info->groups_sem); > > + > > + spin_lock(&bg->lock); > > + if (bg->reserved || bg->pinned || bg->ro) { > > How do we deal with backup supers in zoned again? Will they show up as > readonly? If so we may not want the bg->ro check, but I may be insane.
No superblock/backups are placed into a zone composing a block group, because, if placed, it becomes a hole blocking sequential writes. The zones containing superblock/backups are reserved and no device extents are allocated there. So, bg->ro == 0, if the block group is read-write.