On 3/19/18 2:08 PM, David Sterba wrote:
> On Mon, Mar 19, 2018 at 01:52:05PM -0400, Jeff Mahoney wrote:
>> On 3/16/18 4:12 PM, David Sterba wrote:
>>> On Fri, Mar 16, 2018 at 02:36:27PM -0400, [email protected] wrote:
>>>> From: Jeff Mahoney <[email protected]>
>>>>
>>>> While running btrfs/011, I hit the following lockdep splat.
>>>>
>>>> This is the important bit:
>>>>    pcpu_alloc+0x1ac/0x5e0
>>>>    __percpu_counter_init+0x4e/0xb0
>>>>    btrfs_init_fs_root+0x99/0x1c0 [btrfs]
>>>>    btrfs_get_fs_root.part.54+0x5b/0x150 [btrfs]
>>>>    resolve_indirect_refs+0x130/0x830 [btrfs]
>>>>    find_parent_nodes+0x69e/0xff0 [btrfs]
>>>>    btrfs_find_all_roots_safe+0xa0/0x110 [btrfs]
>>>>    btrfs_find_all_roots+0x50/0x70 [btrfs]
>>>>    btrfs_qgroup_prepare_account_extents+0x53/0x90 [btrfs]
>>>>    btrfs_commit_transaction+0x3ce/0x9b0 [btrfs]
>>>>
>>>> The percpu_counter_init call in btrfs_alloc_subvolume_writers
>>>> uses GFP_KERNEL, which we can't do during transaction commit.
>>>>
>>>> This switches it to GFP_NOFS.
>>>
>>>> Signed-off-by: Jeff Mahoney <[email protected]>
>>>> ---
>>>>  fs/btrfs/disk-io.c | 2 +-
>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
>>>> index 21f34ad0d411..eb6bb3169a9e 100644
>>>> --- a/fs/btrfs/disk-io.c
>>>> +++ b/fs/btrfs/disk-io.c
>>>> @@ -1108,7 +1108,7 @@ static struct btrfs_subvolume_writers 
>>>> *btrfs_alloc_subvolume_writers(void)
>>>>    if (!writers)
>>>>            return ERR_PTR(-ENOMEM);
>>>>  
>>>> -  ret = percpu_counter_init(&writers->counter, 0, GFP_KERNEL);
>>>> +  ret = percpu_counter_init(&writers->counter, 0, GFP_NOFS);
>>>
>>> A line above the diff context is another allocation that does GFP_NOFS,
>>> so one of the gfp flags were wrong.
>>>
>>> Looks like there's another instance where percpu allocates with
>>> GFP_KERNEL: create_space_info that can be called from the path that
>>> allocates chunks, so this also looks like a NOFS candidate.
>>
>> We can get rid of this case entirely.  Those call sites should be
>> removed since the space_infos are all allocated at mount time.
> 
> That would be great and make a few things simpler. So this means that
> __find_space_info never fails once the space infos are properly
> initialized, right? That was my concern in do_chunk_alloc and
> btrfs_make_block_group (that's called from __btrfs_alloc_chunk).

That's a different case.  The raid levels are added when the first block
group of a particular read level is loaded up.  That can happen when the
block groups are read in initially, where it should be safe to use
GFP_KERNEL or when a chunk of a new type is allocated.  The thing is
that a chunk of a new type will only be allocated when we're converting
via balance, so we may be able to do the kobject_add for the raid level
when we start the balance rather than wait for it to create the block group.

-Jeff


-- 
Jeff Mahoney
SUSE Labs

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to