On Tue, Apr 13, 2021 at 1:44 PM Johannes Thumshirn
<johannes.thumsh...@wdc.com> wrote:
>
> On 12/04/2021 16:21, Johannes Thumshirn wrote:
> >> If it affects only the zoned case, I also don't see why doing it when
> >> not in zoned mode (and -o discard=sync is set).
> >> Unless of course the default discard mechanism is broken somehow, in
> >> which case we need to find out why and fix it.
> >
> > I'm at it.
> >
>
> OK I've found out what's wrong. In btrfs_relocate_block_group() we're calling
> btrfs_inc_block_group_ro(). btrfs_update_block_group() will call
> btrfs_mark_bg_unused() to add the block group to the 'fs_info->unused_bgs' 
> list.
>
> But in btrfs_delete_unused_bgs() we have this check:
>                 if (block_group->reserved || block_group->pinned ||
>                     block_group->used || block_group->ro ||
>                     list_is_singular(&block_group->list)) {
>                         /*
>                          * We want to bail if we made new allocations or have
>                          * outstanding allocations in this block group.  We do
>                          * the ro check in case balance is currently acting on
>                          * this block group.
>                          */
>                         trace_btrfs_skip_unused_block_group(block_group);
>                         spin_unlock(&block_group->lock);
>                         up_write(&space_info->groups_sem);
>                         goto next;
>                 }
>
> So we're skipping over the removal of the block group. I've instrumented the
> kernel and also tested with non-zoned devices, this skip is always present
> with block_group->ro = 1.

Yep, expected, see below.

>
> Also the auto-relocation patch has a problem, not decrementing block_group->ro
> and so it's left at ro = 2 after auto relocation got triggered.
>
> So the question is, why are we leaving block_group->ro at 1 after relocation
> finished? Probably that the allocator doesn't pick the block group for the 
> next
> allocation.

Yes. Otherwise after relocating (or even during relocation), a task
allocates an extent from the block group, and then blindly after we
delete the block group - this would be racy and cause corruption.

>
> What am I missing?

And what about the other mechanism that triggers discards on pinned
extents, after the transaction commits the super blocks?
Why isn't that happening (with -o discard=sync)? We create the delayed
references to drop extents from the relocated block group, which
results in pinning extents.
This is the case that surprised me that it isn't working for you.

Thanks.

>
> Thanks,
>         Johannes



-- 
Filipe David Manana,

“Whether you think you can, or you think you can't — you're right.”

Reply via email to