On 12/04/2021 16:21, Johannes Thumshirn wrote:
>> If it affects only the zoned case, I also don't see why doing it when
>> not in zoned mode (and -o discard=sync is set).
>> Unless of course the default discard mechanism is broken somehow, in
>> which case we need to find out why and fix it.
> 
> I'm at it.
> 

OK I've found out what's wrong. In btrfs_relocate_block_group() we're calling
btrfs_inc_block_group_ro(). btrfs_update_block_group() will call 
btrfs_mark_bg_unused() to add the block group to the 'fs_info->unused_bgs' list.

But in btrfs_delete_unused_bgs() we have this check:
                if (block_group->reserved || block_group->pinned ||             
                                                                                
                     
                    block_group->used || block_group->ro ||                     
                                                                                
                     
                    list_is_singular(&block_group->list)) {                     
                                                                                
                     
                        /*                                                      
                                                                                
                     
                         * We want to bail if we made new allocations or have   
                                                                                
                     
                         * outstanding allocations in this block group.  We do  
                                                                                
                     
                         * the ro check in case balance is currently acting on  
                                                                                
                     
                         * this block group.                                    
                                                                                
                     
                         */                                                     
                                                                                
                     
                        trace_btrfs_skip_unused_block_group(block_group);       
                                                                                
                     
                        spin_unlock(&block_group->lock);                        
                                                                                
                     
                        up_write(&space_info->groups_sem);                      
                                                                                
                     
                        goto next;                                              
                                                                                
                     
                }                                                           

So we're skipping over the removal of the block group. I've instrumented the
kernel and also tested with non-zoned devices, this skip is always present
with block_group->ro = 1.

Also the auto-relocation patch has a problem, not decrementing block_group->ro
and so it's left at ro = 2 after auto relocation got triggered.

So the question is, why are we leaving block_group->ro at 1 after relocation 
finished? Probably that the allocator doesn't pick the block group for the next
allocation.

What am I missing?

Thanks,
        Johannes

Reply via email to