We've seen this on a wide variety of workloads, including general user
logins with NFS mounts, SLURM head and cluster nodes, a
Prometheus/Grafana server, a Grafana Loki server, two Exim servers, a
Samba server, LDAP servers, Matlab license servers, and a monitoring
machine that just runs conserver. It seems to be correlated with the
amount of processes and activity that happens on a machine, as the two
machines that leaked the most are our primary general use login server
and our Prometheus server (which is constantly running a churn of
monitoring and probe activity). As a result of this, I don't currently
have any particular commands that reproduce this.

It may be relevant that we are auditing some system calls. The generated 
/etc/audit/audit.rules on our servers has:
-D
-b 8192
-f 1
--backlog_wait_time 60000
-a exit,always -F arch=b64 -S execve
-a exit,always -F arch=b32 -S execve

We also have audit log only to files by masking systemd-journald-
audit.socket.

I will see if I can reproduce this in a VM by generating random activity
(I'm going to try repeatedly compiling something over and over), first
in our standard configuration and then in a more minimal one. It will
likely take at least a day or two to know one way or another.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1987430

Title:
  Ubuntu 22.04 kernel 5.15.0-46-generic leaks kernel memory in
  kmalloc-2k slabs

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  Since updating to kernel 5.15.0-46-generic (package version
  5.15.0-46.49), all of our Ubuntu 22.04 LTS servers are leaking kernel
  memory; our first server with 8 GB of RAM just fatally OOMed, causing
  us to detect this. Inspection of OOM reports, /proc/meminfo, and
  /proc/slabinfo says that it's mostly going to unreclaimable kmalloc-2k
  slabs:

          Aug 23 12:51:11 cluster kernel: [361299.864757] Unreclaimable slab 
info:
          Aug 23 12:51:11 cluster kernel: [361299.864757] Name                  
    Used          Total
          [...]
          Aug 23 12:51:11 cluster kernel: [361299.864924] kmalloc-2k           
6676584KB    6676596KB

  Most of our machines appear to be leaking slab memory at a rate of
  around 20 to 40 Mbytes/hour, with some machines leaking much faster;
  the champions are leaking kernel memory at 145 Mbytes/hour and 237
  Mbytes/hour.

  We aren't running any proprietary kernel modules and our only unusual
  kernel configuration is that we've disabled AppArmor with 'apparmor=0'
  on the kernel command line.

  /proc/version_signature:
  Ubuntu 5.15.0-46.49-generic 5.15.39

  Full kernel command line from the Dell R240 system that fatally OOMd:
  BOOT_IMAGE=/boot/vmlinuz-5.15.0-46-generic 
root=UUID=3165564f-a2dd-4b39-935b-114f3e23ff54 ro console=ttyS0,115200 
console=tty0 apparmor=0

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1987430/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to