On 30.03.21 г. 9:24, Wang Yugui wrote:
> Hi, Nikolay Borisov
> With a lot of dump_stack()/printk inserted around ENOMEM in btrfs code,
> we find out the call stack for ENOMEM.
> see the file 0000-btrfs-dump_stack-when-ENOMEM.patch
> #cat /usr/hpc-bio/xfstests/results//generic/476.dmesg
> [ 5759.102929] ENOMEM btrfs_drew_lock_init
> [ 5759.102943] ENOMEM btrfs_init_fs_root
> [ 5759.102947] ------------[ cut here ]------------
> [ 5759.102950] BTRFS: Transaction aborted (error -12)
> [ 5759.103052] WARNING: CPU: 14 PID: 2741468 at
> create_pending_snapshot+0xb8c/0xd50 [btrfs]
> btrfs_drew_lock_init() return -ENOMEM,
> this is the source:
> * We might be called under a transaction (e.g. indirect backref
> * resolution) which could deadlock if it triggers memory reclaim
> nofs_flag = memalloc_nofs_save();
> ret = btrfs_drew_lock_init(&root->snapshot_lock);
> if (ret == -ENOMEM) printk("ENOMEM btrfs_drew_lock_init\n");
> if (ret)
> goto fail;
> And the souce come from:
> commit dcc3eb9638c3c927f1597075e851d0a16300a876
> Author: Nikolay Borisov <nbori...@suse.com>
> Date: Thu Jan 30 14:59:45 2020 +0200
> btrfs: convert snapshot/nocow exlcusion to drew lock
> Any advice to fix this ENOMEM problem?
This is likely coming from changed behavior in MM, doesn't seem related
to btrfs. We have multiple places where nofs_save() is called. By the
same token the failure might have occurred in any other place, in any
other piece of code which uses memalloc_nofs_save, there is no
indication that this is directly related to btrfs.
> top command show that this server have engough memory.
> The hardware of this server:
> CPU: Xeon(R) CPU E5-2660 v2(10 core) *2
> memory: 192G, no swap
You are showing that the server has 192G of installed memory, you have
not shown any stats which prove at the time of failure what is the state
of the MM subsystem. At the very least at the time of failure inspect
the output of :
and "free -m" commands.