H, > On 30.03.21 г. 9:24, Wang Yugui wrote: > > Hi, Nikolay Borisov > > > > With a lot of dump_stack()/printk inserted around ENOMEM in btrfs code, > > we find out the call stack for ENOMEM. > > see the file 0000-btrfs-dump_stack-when-ENOMEM.patch > > > > > > #cat /usr/hpc-bio/xfstests/results//generic/476.dmesg > > ... > > [ 5759.102929] ENOMEM btrfs_drew_lock_init > > [ 5759.102943] ENOMEM btrfs_init_fs_root > > [ 5759.102947] ------------[ cut here ]------------ > > [ 5759.102950] BTRFS: Transaction aborted (error -12) > > [ 5759.103052] WARNING: CPU: 14 PID: 2741468 at > > /ssd/hpc-bio/linux-5.10.27/fs/btrfs/transaction.c:1705 > > create_pending_snapshot+0xb8c/0xd50 [btrfs] > > ... > > > > > > btrfs_drew_lock_init() return -ENOMEM, > > this is the source: > > > > /* > > * We might be called under a transaction (e.g. indirect backref > > * resolution) which could deadlock if it triggers memory reclaim > > */ > > nofs_flag = memalloc_nofs_save(); > > ret = btrfs_drew_lock_init(&root->snapshot_lock); > > memalloc_nofs_restore(nofs_flag); > > if (ret == -ENOMEM) printk("ENOMEM btrfs_drew_lock_init\n"); > > if (ret) > > goto fail; > > > > And the souce come from: > > > > commit dcc3eb9638c3c927f1597075e851d0a16300a876 > > Author: Nikolay Borisov <nbori...@suse.com> > > Date: Thu Jan 30 14:59:45 2020 +0200 > > > > btrfs: convert snapshot/nocow exlcusion to drew lock > > > > > > Any advice to fix this ENOMEM problem? > > This is likely coming from changed behavior in MM, doesn't seem related > to btrfs. We have multiple places where nofs_save() is called. By the > same token the failure might have occurred in any other place, in any > other piece of code which uses memalloc_nofs_save, there is no > indication that this is directly related to btrfs. > > > > > top command show that this server have engough memory. > > > > The hardware of this server: > > CPU: Xeon(R) CPU E5-2660 v2(10 core) *2 > > memory: 192G, no swap > > You are showing that the server has 192G of installed memory, you have > not shown any stats which prove at the time of failure what is the state > of the MM subsystem. At the very least at the time of failure inspect > the output of : > > cat /proc/meminfo > > and "free -m" commands. > > <snip>
Only one xfstest job is running in this server. Best Regards Wang Yugui (wangyu...@e16-tech.com) 2021/03/30