On 13/04/2021 14:57, Filipe Manana wrote: > And what about the other mechanism that triggers discards on pinned > extents, after the transaction commits the super blocks? > Why isn't that happening (with -o discard=sync)? We create the delayed > references to drop extents from the relocated block group, which > results in pinning extents. > This is the case that surprised me that it isn't working for you.
I think this is the case. I would have expected to end up in this part of btrfs_finish_extent_commit(): /* * Transaction is finished. We don't need the lock anymore. We * do need to clean up the block groups in case of a transaction * abort. */ deleted_bgs = &trans->transaction->deleted_bgs; list_for_each_entry_safe(block_group, tmp, deleted_bgs, bg_list) { u64 trimmed = 0; ret = -EROFS; if (!TRANS_ABORTED(trans)) ret = btrfs_discard_extent(fs_info, block_group->start, block_group->length, &trimmed); list_del_init(&block_group->bg_list); btrfs_unfreeze_block_group(block_group); btrfs_put_block_group(block_group); if (ret) { const char *errstr = btrfs_decode_error(ret); btrfs_warn(fs_info, "discard failed while removing blockgroup: errno=%d %s", ret, errstr); } } and the btrfs_discard_extent() over the whole block group would then trigger a REQ_OP_ZONE_RESET operation, resetting the device's zone. But as btrfs_delete_unused_bgs() doesn't add the block group to the ->deleted_bgs list, we're not reaching above code. I /think/ (i.e. verification pending) the -o discard=sync case works for regular block devices, as each extent is discarded on it's own, by this (also in btrfs_finish_extent_commit()): while (!TRANS_ABORTED(trans)) { struct extent_state *cached_state = NULL; mutex_lock(&fs_info->unused_bg_unpin_mutex); ret = find_first_extent_bit(unpin, 0, &start, &end, EXTENT_DIRTY, &cached_state); if (ret) { mutex_unlock(&fs_info->unused_bg_unpin_mutex); break; } if (btrfs_test_opt(fs_info, DISCARD_SYNC)) ret = btrfs_discard_extent(fs_info, start, end + 1 - start, NULL); clear_extent_dirty(unpin, start, end, &cached_state); unpin_extent_range(fs_info, start, end, true); mutex_unlock(&fs_info->unused_bg_unpin_mutex); free_extent_state(cached_state); cond_resched(); } If this is the case, my patch will essentially discard the data twice, for a non-zoned block device, which is certainly not ideal. So the correct fix would be to get the block group into the 'trans->transaction->deleted_bgs' list after relocation, which would work if we wouldn't check for block_group->ro in btrfs_delete_unused_bgs(), but I suppose this check is there for a reason. How about changing the patch to the following: diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 6d9b2369f17a..ba13b2ea3c6f 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -3103,6 +3103,9 @@ static int btrfs_relocate_chunk(struct btrfs_fs_info *fs_info, u64 chunk_offset) struct btrfs_root *root = fs_info->chunk_root; struct btrfs_trans_handle *trans; struct btrfs_block_group *block_group; + u64 length; int ret; /* @@ -3130,8 +3133,16 @@ static int btrfs_relocate_chunk(struct btrfs_fs_info *fs_info, u64 chunk_offset) if (!block_group) return -ENOENT; btrfs_discard_cancel_work(&fs_info->discard_ctl, block_group); + length = block_group->length; btrfs_put_block_group(block_group); + /* + * For a zoned filesystem we need to discard/zone-reset here, as the + * discard code won't discard the whole block-group, but only single + * extents. + */ + if (btrfs_is_zoned(fs_info)) { + ret = btrfs_discard_extent(fs_info, chunk_offset, length, NULL); + if (ret) /* Non working discard is not fatal */ + btrfs_warn(fs_info, "discarding chunk %llu failed", + chunk_offset); + } + trans = btrfs_start_trans_remove_block_group(root->fs_info, chunk_offset); if (IS_ERR(trans)) {