On 15/04/2021 20:37, Josef Bacik wrote: > On 4/15/21 9:58 AM, Johannes Thumshirn wrote: >> When a file gets deleted on a zoned file system, the space freed is not >> returned back into the block group's free space, but is migrated to >> zone_unusable. >> >> As this zone_unusable space is behind the current write pointer it is not >> possible to use it for new allocations. In the current implementation a >> zone is reset once all of the block group's space is accounted as zone >> unusable. >> >> This behaviour can lead to premature ENOSPC errors on a busy file system. >> >> Instead of only reclaiming the zone once it is completely unusable, >> kick off a reclaim job once the amount of unusable bytes exceeds a user >> configurable threshold between 51% and 100%. It can be set per mounted >> filesystem via the sysfs tunable bg_reclaim_threshold which is set to 75% >> per default. >> >> Similar to reclaiming unused block groups, these dirty block groups are >> added to a to_reclaim list and then on a transaction commit, the reclaim >> process is triggered but after we deleted unused block groups, which will >> free space for the relocation process. >> >> Signed-off-by: Johannes Thumshirn <johannes.thumsh...@wdc.com> >> --- >> fs/btrfs/block-group.c | 96 ++++++++++++++++++++++++++++++++++++ >> fs/btrfs/block-group.h | 3 ++ >> fs/btrfs/ctree.h | 6 +++ >> fs/btrfs/disk-io.c | 13 +++++ >> fs/btrfs/free-space-cache.c | 9 +++- >> fs/btrfs/sysfs.c | 35 +++++++++++++ >> fs/btrfs/volumes.c | 2 +- >> fs/btrfs/volumes.h | 1 + >> include/trace/events/btrfs.h | 12 +++++ >> 9 files changed, 175 insertions(+), 2 deletions(-) >> >> diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c >> index bbb5a6e170c7..3f06ea42c013 100644 >> --- a/fs/btrfs/block-group.c >> +++ b/fs/btrfs/block-group.c >> @@ -1485,6 +1485,92 @@ void btrfs_mark_bg_unused(struct btrfs_block_group >> *bg) >> spin_unlock(&fs_info->unused_bgs_lock); >> } >> >> +void btrfs_reclaim_bgs_work(struct work_struct *work) >> +{ >> + struct btrfs_fs_info *fs_info = >> + container_of(work, struct btrfs_fs_info, reclaim_bgs_work); >> + struct btrfs_block_group *bg; >> + struct btrfs_space_info *space_info; >> + int ret = 0; >> + >> + if (!test_bit(BTRFS_FS_OPEN, &fs_info->flags)) >> + return; >> + >> + if (!btrfs_exclop_start(fs_info, BTRFS_EXCLOP_BALANCE)) >> + return; >> + >> + mutex_lock(&fs_info->reclaim_bgs_lock); >> + spin_lock(&fs_info->unused_bgs_lock); >> + while (!list_empty(&fs_info->reclaim_bgs)) { >> + bg = list_first_entry(&fs_info->reclaim_bgs, >> + struct btrfs_block_group, >> + bg_list); >> + list_del_init(&bg->bg_list); >> + >> + space_info = bg->space_info; >> + spin_unlock(&fs_info->unused_bgs_lock); >> + >> + /* Don't want to race with allocators so take the groups_sem */ >> + down_write(&space_info->groups_sem); >> + >> + spin_lock(&bg->lock); >> + if (bg->reserved || bg->pinned || bg->ro) { >> + /* >> + * We want to bail if we made new allocations or have >> + * outstanding allocations in this block group. We do >> + * the ro check in case balance is currently acting on >> + * this block group. >> + */ >> + spin_unlock(&bg->lock); >> + up_write(&space_info->groups_sem); >> + goto next; >> + } >> + spin_unlock(&bg->lock); >> + > > Here I think we want a > > if (btrfs_fs_closing()) > goto next; > > so we don't block out a umount for all eternity. Thanks,
Right, will add.