Greetings, Similar to this user:
https://www.mail-archive.com/[email protected]/msg27698.html I recently upgraded our proxy VMs from Debian 8/Jessie (kernel 3.16.0) to Devuan 2/ASCII (Debian Stretch w/o systemd, kernel 4.9.0). I know running haproxy on a VM is often discouraged, but we have done so for years with great success. Right now I'm stress testing the new build on ONE proxy VM doing 861 req/s, 2.26 Gbps outbound traffic, 70k pps in, 90k pps out with quite a bit of capacity to spare. It can be done with some tweaking but nothing much outside of what would have to be done on hardware. Our VM hosts are have one Xeon E5-2687W v4 processor (12 core, 24 logical), 256GB ram, and dual Intel 10G adapters, one for external traffic, one for internal traffic. I have the proxy VMs configured with 8 cores, 8GB ram, and two virtio adapters both with multi-queue set to 2 (which gives me two receive queues per adapter). We're running Proxmox 5. haproxy is a custom build of 1.8.14 built with: make TARGET=linux2628 USE_PCRE=1 USE_GETADDRINFO=1 USE_OPENSSL=1 USE_ZLIB=1 USE_FUTEX=1 I have each receive queue pinned to a different processor (0 - 3). haproxy is configured with nbproc 4 and pinned to procs 4 - 7. iptables with connection tracking is enabled (I couldn't see ANY performance benefits from using a stateless firewall). I can get near wire speeds between VM hosts as well as between VM guests on the local network. The problem we saw right away was when any amount of traffic was flowing through these new proxy builds, single stream throughput would be severely reduced. Without load, I could pull down a file at 200+ Mbps with a single stream. With load, that would drop to 10-15 Mbps if that. This meant that 1080p videos would endlessly buffer and large images would load like they did in the 90s on dial-up. Not good. After a bunch of trial and error, I narrowed the issue down to the network layer itself. The only thing I could find that may have pointed to what was going on was this: # netstat -s | grep buffer 16889843 packets pruned from receive queue because of socket buffer overrun 7626 packets dropped from out-of-order queue because of socket buffer overrun 3912652 packets collapsed in receive queue due to low socket buffer These values were incrementing a lot faster than on the old build. My research on this pointed to w/rmem settings, which I've never adjusted before because most recommendations seem to be to leave these alone. Plus I could never determine that we actually needed to adjust these. Here are the sysctl settings we've been using for years: vm.swappiness=10 net.ipv4.tcp_tw_reuse=1 net.ipv4.ip_local_port_range=1024 65535 net.core.somaxconn=10240 net.core.netdev_max_backlog=10240 net.ipv4.conf.all.rp_filter=1 net.ipv4.tcp_max_syn_backlog=10240 net.ipv4.tcp_synack_retries=3 net.ipv4.tcp_syncookies=1 net.netfilter.nf_conntrack_max=4194304 After doing a TON of research, I decided to adjust the r/wmem settings. >From here: http://fasterdata.es.net/host-tuning/linux/ https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/Welcome%20to%20High%20Performance%20Computing%20%28HPC%29%20Central/page/Linux%20System%20Tuning%20Recommendations I settled on the following: # allow testing with buffers up to 128MB net.core.rmem_max = 134217728 net.core.wmem_max = 134217728 # increase Linux autotuning TCP buffer limit to 64MB net.ipv4.tcp_rmem = 4096 87380 67108864 net.ipv4.tcp_wmem = 4096 65536 67108864 Which is good "For a host with a 10G NIC optimized for network paths up to 200ms RTT, and for friendliness to single and parallel stream tools..." which seemed fine for us. However, these settings didn't make any difference. The next thing I did was to try adjusting net.ipv4.tcp_mem. This is the one setting almost everyone says to leave alone, that the kernel defaults are good enough. Well, adjusting this one setting is what seemed to fix this issue for us. Here is the default values the kernel set on Devuan / Stretch: net.ipv4.tcp_mem = 94401 125868 188802 On Jessie: net.ipv4.tcp_mem = 92394 123194 184788 Here is what I set it to: net.ipv4.tcp_mem = 16777216 16777216 16777216 I can create the low throughput issue by changing tcp_mem back to the defaults. I'm not even sure the other settings are necessary (still testing that). Can anyone shed some light on why adjusting tcp_mem fixed this? Are the other settings needed / appropriate? I'm not fond of deploying anything into production with settings I've copied from the internet without fully understanding what I'm doing. Most posts on this only copy the kernel docs verbatim. Since almost everyone says "do NOT adjust tcp_mem" there isn't much documentation out there that I can find on when you SHOULD adjust this setting. All I know is that by changing tcp_mem I can run an iperf test and get over 1 Gbps even with site traffic being over 2 Gbps (we have 3 Gbps available). File downloads are now snappier and get up to speed faster than before. If anyone has some input on this, I'd really appreciate it. I'd love to be able to understand these settings better and what the ramifications of changing them are. Thanks! -- Brendon Colby Senior DevOps Engineer Newgrounds.com

