On Fri, Mar 19, 2021 at 01:59:02PM -0400, Josef Bacik wrote:
> On 3/19/21 6:48 AM, Johannes Thumshirn wrote:
> > When a file gets deleted on a zoned file system, the space freed is not
> > returned back into the block group's free space, but is migrated to
> > zone_unusable.
> > 
> > As this zone_unusable space is behind the current write pointer it is not
> > possible to use it for new allocations. In the current implementation a
> > zone is reset once all of the block group's space is accounted as zone
> > unusable.
> > 
> > This behaviour can lead to premature ENOSPC errors on a busy file system.
> > 
> > Instead of only reclaiming the zone once it is completely unusable,
> > kick off a reclaim job once the amount of unusable bytes exceeds a user
> > configurable threshold between 51% and 100%. It can be set per mounted
> > filesystem via the sysfs tunable bg_reclaim_threshold which is set to 75%
> > per default.
> > 
> > Similar to reclaiming unused block groups, these dirty block groups are
> > added to a to_reclaim list and then on a transaction commit, the reclaim
> > process is triggered but after we deleted unused block groups, which will
> > free space for the relocation process.
> > 
> > Signed-off-by: Johannes Thumshirn <johannes.thumsh...@wdc.com>
> > ---
> > 
> > AFAICT sysfs_create_files() does not have the ability to provide a 
> > is_visible
> > callback, so the bg_reclaim_threshold sysfs file is visible for non zoned
> > filesystems as well, even though only for zoned filesystems we're adding 
> > block
> > groups to the reclaim list. I'm all ears for a approach that is sensible in
> > this regard.
> > 
> > 
> >   fs/btrfs/block-group.c       | 84 ++++++++++++++++++++++++++++++++++++
> >   fs/btrfs/block-group.h       |  2 +
> >   fs/btrfs/ctree.h             |  3 ++
> >   fs/btrfs/disk-io.c           | 11 +++++
> >   fs/btrfs/free-space-cache.c  |  9 +++-
> >   fs/btrfs/sysfs.c             | 35 +++++++++++++++
> >   fs/btrfs/volumes.c           |  2 +-
> >   fs/btrfs/volumes.h           |  1 +
> >   include/trace/events/btrfs.h | 12 ++++++
> >   9 files changed, 157 insertions(+), 2 deletions(-)
> > 
> > diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
> > index 9ae3ac96a521..af9026795ddd 100644
> > --- a/fs/btrfs/block-group.c
> > +++ b/fs/btrfs/block-group.c
> > @@ -1485,6 +1485,80 @@ void btrfs_mark_bg_unused(struct btrfs_block_group 
> > *bg)
> >     spin_unlock(&fs_info->unused_bgs_lock);
> >   }
> > +void btrfs_reclaim_bgs(struct btrfs_fs_info *fs_info)
> > +{
> > +   struct btrfs_block_group *bg;
> > +   struct btrfs_space_info *space_info;
> > +   int ret = 0;
> > +
> > +   if (!test_bit(BTRFS_FS_OPEN, &fs_info->flags))
> > +           return;
> > +
> > +   if (!btrfs_exclop_start(fs_info, BTRFS_EXCLOP_BALANCE))
> > +           return;
> > +
> > +   mutex_lock(&fs_info->reclaim_bgs_lock);
> > +   while (!list_empty(&fs_info->reclaim_bgs)) {
> > +           bg = list_first_entry(&fs_info->reclaim_bgs,
> > +                                 struct btrfs_block_group,
> > +                                 bg_list);
> > +           list_del_init(&bg->bg_list);
> > +
> > +           space_info = bg->space_info;
> > +           mutex_unlock(&fs_info->reclaim_bgs_lock);
> > +
> > +           /* Don't want to race with allocators so take the groups_sem */
> > +           down_write(&space_info->groups_sem);
> > +
> > +           spin_lock(&bg->lock);
> > +           if (bg->reserved || bg->pinned || bg->ro) {
> 
> How do we deal with backup supers in zoned again?  Will they show up as
> readonly?  If so we may not want the bg->ro check, but I may be insane.

No superblock/backups are placed into a zone composing a block group,
because, if placed, it becomes a hole blocking sequential writes. The
zones containing superblock/backups are reserved and no device extents
are allocated there. So, bg->ro == 0, if the block group is read-write.

Reply via email to