On Tue, Apr 13, 2021 at 6:48 PM Johannes Thumshirn
<[email protected]> wrote:
>
> On 13/04/2021 14:57, Filipe Manana wrote:
> > And what about the other mechanism that triggers discards on pinned
> > extents, after the transaction commits the super blocks?
> > Why isn't that happening (with -o discard=sync)? We create the delayed
> > references to drop extents from the relocated block group, which
> > results in pinning extents.
> > This is the case that surprised me that it isn't working for you.
>
> I think this is the case. I would have expected to end up in this
> part of btrfs_finish_extent_commit():
>
>
> /*
> * Transaction is finished. We don't need the lock anymore. We
> * do need to clean up the block groups in case of a transaction
> * abort.
> */
> deleted_bgs = &trans->transaction->deleted_bgs;
> list_for_each_entry_safe(block_group, tmp, deleted_bgs, bg_list) {
> u64 trimmed = 0;
>
> ret = -EROFS;
> if (!TRANS_ABORTED(trans))
> ret = btrfs_discard_extent(fs_info,
> block_group->start,
> block_group->length,
> &trimmed);
>
> list_del_init(&block_group->bg_list);
> btrfs_unfreeze_block_group(block_group);
> btrfs_put_block_group(block_group);
>
> if (ret) {
> const char *errstr = btrfs_decode_error(ret);
> btrfs_warn(fs_info,
> "discard failed while removing blockgroup:
> errno=%d %s",
> ret, errstr);
> }
> }
>
> and the btrfs_discard_extent() over the whole block group would then trigger a
> REQ_OP_ZONE_RESET operation, resetting the device's zone.
>
> But as btrfs_delete_unused_bgs() doesn't add the block group to the
> ->deleted_bgs list, we're not reaching above code. I /think/ (i.e.
> verification
> pending) the -o discard=sync case works for regular block devices, as each
> extent
> is discarded on it's own, by this (also in btrfs_finish_extent_commit()):
>
> while (!TRANS_ABORTED(trans)) {
> struct extent_state *cached_state = NULL;
>
> mutex_lock(&fs_info->unused_bg_unpin_mutex);
> ret = find_first_extent_bit(unpin, 0, &start, &end,
> EXTENT_DIRTY, &cached_state);
> if (ret) {
> mutex_unlock(&fs_info->unused_bg_unpin_mutex);
> break;
> }
>
> if (btrfs_test_opt(fs_info, DISCARD_SYNC))
> ret = btrfs_discard_extent(fs_info, start,
> end + 1 - start, NULL);
>
> clear_extent_dirty(unpin, start, end, &cached_state);
> unpin_extent_range(fs_info, start, end, true);
> mutex_unlock(&fs_info->unused_bg_unpin_mutex);
> free_extent_state(cached_state);
> cond_resched();
> }
>
> If this is the case, my patch will essentially discard the data twice, for a
> non-zoned block device, which is certainly not ideal.
Yep, that's what puzzled me, why the need to do it for non-zoned file
systems when using -o discard=sync.
I assumed you ran into a case where discard was not happening due to
some bug bug in the extent pinning/unpinning mechanism.
> So the correct fix would
> be to get the block group into the 'trans->transaction->deleted_bgs' list
> after relocation, which would work if we wouldn't check for block_group->ro in
> btrfs_delete_unused_bgs(), but I suppose this check is there for a reason.
Actually the check for ->ro does not make sense anymore since I
introduced the delete_unused_bgs_mutex in commit
67c5e7d464bc466471b05e027abe8a6b29687ebd.
When the ->ro check was added
(47ab2a6c689913db23ccae38349714edf8365e0a), it was meant to prevent
the cleaner kthread and relocation tasks from calling
btrfs_remove_chunk() concurrently, but checking for ->ro only was
buggy, hence the addition of delete_unused_bgs_mutex later.
>
> How about changing the patch to the following:
Looks good.
However would just removing the ->ro check by enough as well?
Thanks Johannes.
>
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 6d9b2369f17a..ba13b2ea3c6f 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -3103,6 +3103,9 @@ static int btrfs_relocate_chunk(struct btrfs_fs_info
> *fs_info, u64 chunk_offset)
> struct btrfs_root *root = fs_info->chunk_root;
> struct btrfs_trans_handle *trans;
> struct btrfs_block_group *block_group;
> + u64 length;
> int ret;
>
> /*
> @@ -3130,8 +3133,16 @@ static int btrfs_relocate_chunk(struct btrfs_fs_info
> *fs_info, u64 chunk_offset)
> if (!block_group)
> return -ENOENT;
> btrfs_discard_cancel_work(&fs_info->discard_ctl, block_group);
> + length = block_group->length;
> btrfs_put_block_group(block_group);
>
> + /*
> + * For a zoned filesystem we need to discard/zone-reset here, as the
> + * discard code won't discard the whole block-group, but only single
> + * extents.
> + */
> + if (btrfs_is_zoned(fs_info)) {
> + ret = btrfs_discard_extent(fs_info, chunk_offset, length,
> NULL);
> + if (ret) /* Non working discard is not fatal */
> + btrfs_warn(fs_info, "discarding chunk %llu failed",
> + chunk_offset);
> + }
> +
> trans = btrfs_start_trans_remove_block_group(root->fs_info,
> chunk_offset);
> if (IS_ERR(trans)) {
--
Filipe David Manana,
“Whether you think you can, or you think you can't — you're right.”