On Tue, Apr 13, 2021 at 1:44 PM Johannes Thumshirn
<johannes.thumsh...@wdc.com> wrote:
> On 12/04/2021 16:21, Johannes Thumshirn wrote:
> >> If it affects only the zoned case, I also don't see why doing it when
> >> not in zoned mode (and -o discard=sync is set).
> >> Unless of course the default discard mechanism is broken somehow, in
> >> which case we need to find out why and fix it.
> >
> > I'm at it.
> >
> OK I've found out what's wrong. In btrfs_relocate_block_group() we're calling
> btrfs_inc_block_group_ro(). btrfs_update_block_group() will call
> btrfs_mark_bg_unused() to add the block group to the 'fs_info->unused_bgs' 
> list.
> But in btrfs_delete_unused_bgs() we have this check:
>                 if (block_group->reserved || block_group->pinned ||
>                     block_group->used || block_group->ro ||
>                     list_is_singular(&block_group->list)) {
>                         /*
>                          * We want to bail if we made new allocations or have
>                          * outstanding allocations in this block group.  We do
>                          * the ro check in case balance is currently acting on
>                          * this block group.
>                          */
>                         trace_btrfs_skip_unused_block_group(block_group);
>                         spin_unlock(&block_group->lock);
>                         up_write(&space_info->groups_sem);
>                         goto next;
>                 }
> So we're skipping over the removal of the block group. I've instrumented the
> kernel and also tested with non-zoned devices, this skip is always present
> with block_group->ro = 1.

Yep, expected, see below.

> Also the auto-relocation patch has a problem, not decrementing block_group->ro
> and so it's left at ro = 2 after auto relocation got triggered.
> So the question is, why are we leaving block_group->ro at 1 after relocation
> finished? Probably that the allocator doesn't pick the block group for the 
> next
> allocation.

Yes. Otherwise after relocating (or even during relocation), a task
allocates an extent from the block group, and then blindly after we
delete the block group - this would be racy and cause corruption.

> What am I missing?

And what about the other mechanism that triggers discards on pinned
extents, after the transaction commits the super blocks?
Why isn't that happening (with -o discard=sync)? We create the delayed
references to drop extents from the relocated block group, which
results in pinning extents.
This is the case that surprised me that it isn't working for you.


> Thanks,
>         Johannes

Filipe David Manana,

“Whether you think you can, or you think you can't — you're right.”

Reply via email to