Hi,

> > kernel: at least 5.10.26/5.10.27/5.10.28
> > 
> > This problem is triggered by our application, NOT xfstests.
> > But our applicaiton have some heavy write load just like 
> > xfstest/generic/476.
> > Our application use at most 75% of memory, if still not enough, 
> > it will write out all buffer info to filesystem.
> 
> Do you use cgroups at all? If yes can you describe the workload pattern
> a bit.

cgroups is enabled defaultly, so cgroups is used.

This is the output of systemd-cgls, ''samtools.nipt sort -m 60G" is one
of our application.  but our application is NOT cgroups-aware, and it NOT
call any cgroup interface directly.

Control group /:
-.slice
├─user.slice
│ └─user-0.slice
│   ├─session-55.scope
│   │ ├─48747 sshd: root [priv]
│   │ ├─48788 sshd: root@notty
│   │ ├─48795 perl -e @GNU_Parallel=split/_/,"use_IPC::Open3;_use_MIME::Base6...
│   │ ├─48943 samtools.nipt sort -m 60G -T /nodetmp//nfs/biowrk/baseline.wgs2...
│   │ ├─....
│   └─user@0.service
│     └─init.scope
│       ├─48775 /usr/lib/systemd/systemd --user
│       └─48781 (sd-pam)
├─init.scope
│ └─1 /usr/lib/systemd/systemd --switched-root --system --deserialize 18
└─system.slice
  ├─rngd.service
  │ └─1577 /sbin/rngd -f --fill-watermark=0
  ├─irqbalance.service
  │ └─1543 /usr/sbin/irqbalance --foreground
....


> > This problem is happen in linux kernel 5.10.x, but not happen in linux
> > kernel 5.4.x. It have high frequency to repduce too.
> 
> Ah. Can you try the following patch?
> https://lore.kernel.org/lkml/20210408035736.883861-4-g...@fb.com/
> 
> Thanks,
> Dennis

kernel: kernel 5.10.28+this patch
result: yet not happen after 4 times test.
          without this path, the reproduce frequency is >50%

And a question about this,
> > > > upper caller:
> > > >     nofs_flag = memalloc_nofs_save();
> > > >     ret = btrfs_drew_lock_init(&root->snapshot_lock);
> > > >     memalloc_nofs_restore(nofs_flag);
> 
> The issue is here. nofs is set which means percpu attempts an atomic
> allocation. If it cannot find anything already allocated it isn't happy.
> This was done before memalloc_nofs_{save/restore}() were pervasive.
> 
> Percpu should probably try to allocate some pages if possible even if
> nofs is set.

Should we check and pre-alloc memory inside memalloc_nofs_restore()?
another memalloc_nofs_save() may come soon.

something like this in memalloc_nofs_save()?
        if (pcpu_nr_empty_pop_pages[type] < PCPU_EMPTY_POP_PAGES_LOW)
                pcpu_schedule_balance_work();


by the way, this problem still happen in kernel 5.10.28+this patch.
Is this is a PANIC without OOPS?  any guide for troubleshooting please.
> problem:
> OS/VGA console is freezed , and no call trace is outputed.
> Just some info is outputed to IPMI/dell iDRAC
>    2 | 04/03/2021 | 11:35:01 | OS Critical Stop #0x46 | Run-time critical 
> stop () | Asserted
>    3 | Linux kernel panic: Fatal excep
>    4 | Linux kernel panic: tion

Best Regards
Wang Yugui (wangyu...@e16-tech.com)
2021/04/08

Reply via email to