Just some additional notes for anyone who might stumble upon this same
issue in the future.

I did a bunch more testing with this and confirmed that it was indeed
net.ipv4.tcp_mem that needed to be tweaked. I set the r/wmem settings back
to the defaults and ultimately set tcp_mem to "512000 512000 512000".

The reason this was so hard to find is because the kernel doesn't report
anything when it enters a "memory pressure state." I would have never known
to even check this, especially since most sites say not to touch this
parameter! Plus, older kernels didn't work like this, so I'm not exactly
sure what changed.

You can check current TCP memory usage like this:

# cat /proc/net/sockstat
sockets: used 5392
TCP: inuse 7214 orphan 1927 tw 14811 alloc 7217 mem 317331

The mem value is in pages and directly corresponds to tcp_mem.

Once the middle value of tcp_mem is reached (pressure), the kernel enters
"memory pressure mode." I can't find any documentation as to what this
means exactly, only hints here and there.

For example, if I set tcp_mem back to the defaults of '94401 125868 188802'
it quickly reaches the pressure value (125868) and starts to limit r/w
buffer sizes (I think) so that mem value comes down pretty quickly to <125K
(which of course throttles downloads considerably). Once I set tcp_mem back
up to 512K, the mem value increases back to over 300K almost instantly.

It appears that the best way to determine if tcp_mem needs to be adjusted
is to check /proc/net/sockstat and see if the TCP mem value is anywhere
close to tcp_mem's middle / pressure value. If it is, try setting tcp_mem
to a much higher value and see if /proc/net/sockstat TCP mem increases
drastically. If it does, that means tcp_mem needs to be set higher.

Since these VMs have 8GB of memory, I set the tcp_mem to a more reasonable
value of 512000 pages or about 2GB. I like setting the three values to the
same number because then the kernel won't silently enter memory pressure
mode and throttle connections.

Once the tcp_mem max is reached, the kernel reports this:

TCP: out of memory -- consider tuning tcp_mem

Which I can then alert on.

I left the other values at their defaults. Everything appears to be working
as expected now.

--
Brendon Colby
Senior DevOps Engineer
Newgrounds.com

Reply via email to