On 12/04/2021 16:21, Johannes Thumshirn wrote: >> If it affects only the zoned case, I also don't see why doing it when >> not in zoned mode (and -o discard=sync is set). >> Unless of course the default discard mechanism is broken somehow, in >> which case we need to find out why and fix it. > > I'm at it. >
OK I've found out what's wrong. In btrfs_relocate_block_group() we're calling btrfs_inc_block_group_ro(). btrfs_update_block_group() will call btrfs_mark_bg_unused() to add the block group to the 'fs_info->unused_bgs' list. But in btrfs_delete_unused_bgs() we have this check: if (block_group->reserved || block_group->pinned || block_group->used || block_group->ro || list_is_singular(&block_group->list)) { /* * We want to bail if we made new allocations or have * outstanding allocations in this block group. We do * the ro check in case balance is currently acting on * this block group. */ trace_btrfs_skip_unused_block_group(block_group); spin_unlock(&block_group->lock); up_write(&space_info->groups_sem); goto next; } So we're skipping over the removal of the block group. I've instrumented the kernel and also tested with non-zoned devices, this skip is always present with block_group->ro = 1. Also the auto-relocation patch has a problem, not decrementing block_group->ro and so it's left at ro = 2 after auto relocation got triggered. So the question is, why are we leaving block_group->ro at 1 after relocation finished? Probably that the allocator doesn't pick the block group for the next allocation. What am I missing? Thanks, Johannes