On Mon, Apr 12, 2021 at 2:49 PM Johannes Thumshirn
<johannes.thumsh...@wdc.com> wrote:
>
> On 09/04/2021 13:37, Filipe Manana wrote:
> > On Fri, Apr 9, 2021 at 11:54 AM Johannes Thumshirn
> > <johannes.thumsh...@wdc.com> wrote:
> >>
> >> When relocating a block group the freed up space is not discarded. On
> >> devices like SSDs this hint is useful to tell the device the space is
> >> freed now. On zoned block devices btrfs' discard code will reset the zone
> >> the block group is on, freeing up the occupied space.
> >>
> >> Signed-off-by: Johannes Thumshirn <johannes.thumsh...@wdc.com>
> >> ---
> >>  fs/btrfs/volumes.c | 13 +++++++++++++
> >>  1 file changed, 13 insertions(+)
> >>
> >> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> >> index 6d9b2369f17a..d9ef8bce0cde 100644
> >> --- a/fs/btrfs/volumes.c
> >> +++ b/fs/btrfs/volumes.c
> >> @@ -3103,6 +3103,10 @@ static int btrfs_relocate_chunk(struct 
> >> btrfs_fs_info *fs_info, u64 chunk_offset)
> >>         struct btrfs_root *root = fs_info->chunk_root;
> >>         struct btrfs_trans_handle *trans;
> >>         struct btrfs_block_group *block_group;
> >> +       const bool trim = btrfs_is_zoned(fs_info) ||
> >> +                               btrfs_test_opt(fs_info, DISCARD_SYNC);
> >> +       u64 trimmed;
> >> +       u64 length;
> >>         int ret;
> >>
> >>         /*
> >> @@ -3130,6 +3134,7 @@ static int btrfs_relocate_chunk(struct btrfs_fs_info 
> >> *fs_info, u64 chunk_offset)
> >>         if (!block_group)
> >>                 return -ENOENT;
> >>         btrfs_discard_cancel_work(&fs_info->discard_ctl, block_group);
> >> +       length = block_group->length;
> >>         btrfs_put_block_group(block_group);
> >>
> >>         trans = btrfs_start_trans_remove_block_group(root->fs_info,
> >> @@ -3144,6 +3149,14 @@ static int btrfs_relocate_chunk(struct 
> >> btrfs_fs_info *fs_info, u64 chunk_offset)
> >>          * step two, delete the device extents and the
> >>          * chunk tree entries
> >>          */
> >> +       if (trim) {
> >> +               ret = btrfs_discard_extent(fs_info, chunk_offset, length,
> >> +                                          &trimmed);
> >
> > Ideally we do IO and potentially slow operations such as discard while
> > not holding a transaction handle open, to avoid slowing down anyone
> > trying to commit the transaction.
> >
> >> +               if (ret) {
> >> +                       btrfs_abort_transaction(trans, ret);
> >
> > This is leaking the transaction, still need to call btrfs_end_transaction().
> >
> > Making the discard before joining/starting transaction, would fix both 
> > things.
>
> Note taken.
>
> >
> > Now more importantly, I don't see why the freed space isn't already
> > discarded at this point.
> > Relocation creates delayed references to drop extents from the block
> > group, and when the delayed references are run, we pin the extents
> > through:
> >
> > __btrfs_free_extent() -> btrfs_update_block_group()
> >
> > Then at the very end of the transaction commit, we discard pinned
> > extents and then unpin them, at btrfs_finish_extent_commit().
> > Relocation commits the transaction at the end of
> > relocate_block_group(), so the delayed references are fun, and then we
> > discard and unpin their extents.
> > So that's why I don't get it why the discard is needed here (at least
> > when we are not on a zoned filesystem).
>
> This is a good point to investigate, but as of now, the zone isn't reset.
> Zone reset handling in btrfs is piggy backed onto discard handling, but
> from testing the patchset I see the zone isn't reset:
>
> [   81.014752] BTRFS info (device nullb0): reclaiming chunk 1073741824
> [   81.015732] BTRFS info (device nullb0): relocating block group 1073741824 
> flags data
> [   81.798090] BTRFS info (device nullb0): found 12452 extents, stage: move 
> data extents
> [   82.314562] BTRFS info (device nullb0): found 12452 extents, stage: update 
> data pointers
> # blkzone report -o $((1073741824 >> 9)) -c 1 /dev/nullb0
>   start: 0x000200000, len 0x080000, cap 0x080000, wptr 0x0799a0 reset:0 
> non-seq:0, zcond: 2(oi) [type: 2(SEQ_WRITE_REQUIRED)]

Not familiar with zoned device details, but what you are passing to
blkzone is a logical address, in non-zoned btrfs it does not always
matches the physical address on disk.
So maybe that is not a reliable way to check it.

>
> Whereas when the this patch is applied as well:
> [   85.126542] BTRFS info (device nullb0): reclaiming chunk 1073741824
> [   85.128211] BTRFS info (device nullb0): relocating block group 1073741824 
> flags data
> [   86.061831] BTRFS info (device nullb0): found 12452 extents, stage: move 
> data extents
> [   86.766549] BTRFS info (device nullb0): found 12452 extents, stage: update 
> data pointers
> # blkzone report -c 1 -o $((1073741824 >> 9)) /dev/nullb0
>   start: 0x000200000, len 0x080000, cap 0x080000, wptr 0x000000 reset:0 
> non-seq:0, zcond: 1(em) [type: 2(SEQ_WRITE_REQUIRED)]
>
> As a positive side effect of this, I now have code that can be used in
> xfstests to verify the patchset.

Ok.
Maybe the zone isn't reset properly because the default mechanism is
doing smaller discards, on a per extent basis, and perhaps the order
of those discards matters?

If it affects only the zoned case, I also don't see why doing it when
not in zoned mode (and -o discard=sync is set).
Unless of course the default discard mechanism is broken somehow, in
which case we need to find out why and fix it.

Thanks.

>
> Byte,
>         Johannes



-- 
Filipe David Manana,

“Whether you think you can, or you think you can't — you're right.”

Reply via email to