Public bug reported:

We are running our Kafka brokers on Jammy on ARM64. Previous they were
on kernel version 5.15.0-1028-aws, but a few weeks ago we built a new
AMI and it picked up 6.2.0-1009-aws, and we have also upgraded to
6.2.0-1012-aws and found the same problem.

What we expected to happen:
TCP memory (TCP_MEM) to fluctuate but stay relatively low (on a busy production 
broker running 5.15.0-1028-aws, we average 1900 pages over a 24 hour period)

What happened instead:
TCP memory (TCP_MEM) continues to rise until hitting the limit (1.5 million 
pages as configured currently). At this point, the broker is no longer able to 
properly create new connections and we start seeing "kernel: TCP: out of memory 
-- consider tuning tcp_mem" in dmesg output. If allowed to continue, the broker 
will eventually isolate itself from the rest of the cluster since it can't talk 
to the other brokers.

Attached is a graph of the average TCP memory usage per kernel version
for our production environment over the past 24 hours.

ProblemType: Bug
DistroRelease: Ubuntu 22.04
Package: linux-aws 6.2.0.1012.12~22.04.1
ProcVersionSignature: Ubuntu 6.2.0-1012.12~22.04.1-aws 6.2.16
Uname: Linux 6.2.0-1012-aws aarch64
ApportVersion: 2.20.11-0ubuntu82.5
Architecture: arm64
CasperMD5CheckResult: unknown
CloudArchitecture: aarch64
CloudID: aws
CloudName: aws
CloudPlatform: ec2
CloudRegion: us-east-1
CloudSubPlatform: metadata (http://169.254.169.254)
Date: Mon Sep 25 20:56:02 2023
Ec2AMI: ami-0b9c5aafc5b2a4725
Ec2AMIManifest: (unknown)
Ec2Architecture: arm64
Ec2AvailabilityZone: us-east-1b
Ec2Imageid: ami-0b9c5aafc5b2a4725
Ec2InstanceType: im4gn.4xlarge
Ec2Instancetype: im4gn.4xlarge
Ec2Kernel: unavailable
Ec2Ramdisk: unavailable
Ec2Region: us-east-1
ProcEnviron:
 TERM=xterm-256color
 PATH=(custom, no user)
 XDG_RUNTIME_DIR=<set>
 LANG=C.UTF-8
 SHELL=/bin/bash
SourcePackage: linux-meta-aws-6.2
UpgradeStatus: No upgrade log present (probably fresh install)

** Affects: linux-meta-aws-6.2 (Ubuntu)
     Importance: Undecided
         Status: New


** Tags: apport-bug arm64 ec2-images jammy

** Attachment added: "Graph of tcp memory usage from /proc/net/sockstat"
   
https://bugs.launchpad.net/bugs/2037335/+attachment/5704376/+files/tcp_mem_leak.png

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-meta-aws-6.2 in Ubuntu.
https://bugs.launchpad.net/bugs/2037335

Title:
  kernel leaking TCP_MEM

Status in linux-meta-aws-6.2 package in Ubuntu:
  New

Bug description:
  We are running our Kafka brokers on Jammy on ARM64. Previous they were
  on kernel version 5.15.0-1028-aws, but a few weeks ago we built a new
  AMI and it picked up 6.2.0-1009-aws, and we have also upgraded to
  6.2.0-1012-aws and found the same problem.

  What we expected to happen:
  TCP memory (TCP_MEM) to fluctuate but stay relatively low (on a busy 
production broker running 5.15.0-1028-aws, we average 1900 pages over a 24 hour 
period)

  What happened instead:
  TCP memory (TCP_MEM) continues to rise until hitting the limit (1.5 million 
pages as configured currently). At this point, the broker is no longer able to 
properly create new connections and we start seeing "kernel: TCP: out of memory 
-- consider tuning tcp_mem" in dmesg output. If allowed to continue, the broker 
will eventually isolate itself from the rest of the cluster since it can't talk 
to the other brokers.

  Attached is a graph of the average TCP memory usage per kernel version
  for our production environment over the past 24 hours.

  ProblemType: Bug
  DistroRelease: Ubuntu 22.04
  Package: linux-aws 6.2.0.1012.12~22.04.1
  ProcVersionSignature: Ubuntu 6.2.0-1012.12~22.04.1-aws 6.2.16
  Uname: Linux 6.2.0-1012-aws aarch64
  ApportVersion: 2.20.11-0ubuntu82.5
  Architecture: arm64
  CasperMD5CheckResult: unknown
  CloudArchitecture: aarch64
  CloudID: aws
  CloudName: aws
  CloudPlatform: ec2
  CloudRegion: us-east-1
  CloudSubPlatform: metadata (http://169.254.169.254)
  Date: Mon Sep 25 20:56:02 2023
  Ec2AMI: ami-0b9c5aafc5b2a4725
  Ec2AMIManifest: (unknown)
  Ec2Architecture: arm64
  Ec2AvailabilityZone: us-east-1b
  Ec2Imageid: ami-0b9c5aafc5b2a4725
  Ec2InstanceType: im4gn.4xlarge
  Ec2Instancetype: im4gn.4xlarge
  Ec2Kernel: unavailable
  Ec2Ramdisk: unavailable
  Ec2Region: us-east-1
  ProcEnviron:
   TERM=xterm-256color
   PATH=(custom, no user)
   XDG_RUNTIME_DIR=<set>
   LANG=C.UTF-8
   SHELL=/bin/bash
  SourcePackage: linux-meta-aws-6.2
  UpgradeStatus: No upgrade log present (probably fresh install)

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-meta-aws-6.2/+bug/2037335/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to