On Thu, Apr 08, 2021 at 05:20:00PM +0800, Wang Yugui wrote:
> Hi,
>
> > On Thu, Apr 08, 2021 at 07:28:01AM +0800, Wang Yugui wrote:
> > > Hi,
> > >
> > > > > > > upper caller:
> > > > > > > nofs_flag = memalloc_nofs_save();
> > > > > > > ret = btrfs_drew_lock_init(&root->snapshot_lock);
> > > > > > > memalloc_nofs_restore(nofs_flag);
> > > >
> > > > The issue is here. nofs is set which means percpu attempts an atomic
> > > > allocation. If it cannot find anything already allocated it isn't happy.
> > > > This was done before memalloc_nofs_{save/restore}() were pervasive.
> > > >
> > > > Percpu should probably try to allocate some pages if possible even if
> > > > nofs is set.
> > >
> > > Thanks.
> > >
> > > I will wait for the patch, and then test it.
> > >
> >
> > I'm currently a bit busy with some other things. Adding support I don't
> > think will be much work, just a little bit tricky.
> >
> > I recommend carrying what you have minus the change to reserved percpu
> > memory for now. If I'm the one to write it, I'll cc you.
> >
> > Thanks,
> > Dennis
>
>
> In the recent test, another problem is triggered too with my extended
> percpu buffer size patch. maybe this info is helpful.
>
> problem:
> OS/VGA console is freezed , and no call stace is outputed.
> Just some info is outputed to IPMI/dell iDRAC
> 2 | 04/03/2021 | 11:35:01 | OS Critical Stop #0x46 | Run-time critical
> stop () | Asserted
> 3 | Linux kernel panic: Fatal excep
> 4 | Linux kernel panic: tion
> 5 | 04/05/2021 | 19:09:14 | OS Critical Stop #0x46 | Run-time critical
> stop () | Asserted
> 6 | Linux kernel panic: Fatal excep
> 7 | Linux kernel panic: tion
> 8 | 04/06/2021 | 13:08:42 | OS Critical Stop #0x46 | Run-time critical
> stop () | Asserted
> 9 | Linux kernel panic: Fatal excep
> a | Linux kernel panic: tion
> b | 04/08/2021 | 02:12:46 | OS Critical Stop #0x46 | Run-time critical
> stop () | Asserted
> c | Linux kernel panic: Fatal excep
> d | Linux kernel panic: tion
Unfortunately non of the above to me is useful.
> kernel: at least 5.10.26/5.10.27/5.10.28
>
> This problem is triggered by our application, NOT xfstests.
> But our applicaiton have some heavy write load just like xfstest/generic/476.
> Our application use at most 75% of memory, if still not enough,
> it will write out all buffer info to filesystem.
Do you use cgroups at all? If yes can you describe the workload pattern
a bit.
> This problem is happen in linux kernel 5.10.x, but not happen in linux
> kernel 5.4.x. It have high frequency to repduce too.
Ah. Can you try the following patch?
https://lore.kernel.org/lkml/[email protected]/
Thanks,
Dennis