Thanks to Sasha, Baptiste and Willy for helping me
On Thu, Jul 3, 2014 at 11:13 PM, Willy Tarreau <[email protected]> wrote: > >> The most interesting part: >> * we did the trick to set the smp_affinity of our eth0 interface to >> cpu 0 and haproxy on cpu 1 with taskset BUT the soft interrupt CPU >> stays on cpu 1 (with the haproxy). This is not what is documented from >> the linux kernel, we dug into the RPS and RFS network features but >> they are not activated in Centos 6 by default so they should not >> interfere. > > I'm pretty sure that CPU 1 is the first thread of the first core of > the second socket. So you're in the absolute worst situation where > all the traffic has to transit through memory, and cache lines are > doing ping-pong between the two sockets. The best thing to do is to > totally stop the second socket for now. > > Please just verify with this : > > grep '' /sys/devices/system/cpu/cpu*topology/phy* > > Second, disable hyperthreading to ensure you're not running haproxy > on one core and the network on the other thread of the same core. You'll > be able to re-enable it once you figure what the problem is, but there's > no reason for wasting time with these parasits for now. Actually i checked and the first 6 cores are the first threads of the first socket, then the 6 next are for the second socket then the next 12 cores are the hyperthreads. So all this should not be an issue although you are right i should deactivate hyperthreading anyway to simplify tests. >> I think that we have been looking way too deep in the problem and the >> solution must be right in front of us. >> >> Does anyone have ideas? > > Could you check your network card's traffic (ideally on the switch) in > terms of bit rate and packet rate in each direction ? At 15khps it depends > a lot on the object size, especially when running on gigabit NICs which > are easily overload OK so i ran new tests with a separate nginx server (using multipll e workers to handle the load). The bottleneck seem to be clearly on the network stack, especially the number of packets per second. What i did: * got back to the standard igb shipped in the kernel (it gets the 8 virtual channels by default for the NIC) * went back to 4 siege clients trying both with and without keep-alive, using 800 concurrency per siege * removes all the smp_affinity of any IRQ (actually by default everything seem to go to cpu0) * pinned the haproxy process to cpu1 using cpu-map config * set ethtool -C eth0 rx-usecs 500 on both nginx and haproxy hosts (it's 3 by default) * deactivate haproxy logging * deactivated splice * activated tcp-smart-connect and http-keep-alive All in all it got me to around 25k request/sec without keep alive and 39k with keep alive. It also solves the soft-interrupt CPU i was seeing on cpu1. I couldn't check on the switch but used iptraf to get some stats: * with very small packets we get at roughly 200kpacket/sec (100k RX 100k TX) at about 50Mbps * with larger response (a default index.html file for nginx on CentOS EPEL) we almost max out the 1G NIC at around ~800Mbps So now the blocking part is not IRQs anymore (they run at worst at 60% of cpu0, now i get at 100% on cpu1 with haproxy and latency spikes to 100ms (instead of 3ms without load). I am still testing with small packets. The goal is to max out the session per sec not the bandwidth. I can still improve the IRQ part using the 5 cores i have left on the same socket since i have 8 virtual channels i can divide by 4 the number of interrupt per core. However now the bottleneck is the haproxy process at 100% (mostly system). I still think that getting 25krps without keep-alive a very low. For Baptiste questions: - what is the exact reference of your CPU ? - what is the frequency of your CPU ? from DMI (The server is a HP Gen8 DL360e): Version: Intel(R) Xeon(R) CPU E5-2430L 0 @ 2.00GHz Voltage: 1.4 V External Clock: 100 MHz Max Speed: 4800 MHz Current Speed: 2000 MHz Status: Populated, Enabled Upgrade: Socket LGA1356 - what is the command line you run on the "client" side (siege) siege -b -c 800 -t30S http://my-lb/ (i use .siegerc for keep-alive too) - have you disabled irq-balance ? no - what type of network interface are you using? (and which driver) igb kernel driver version: 5.0.5-k - are you benchmarking in keep-alive mode or not? i was not, keep-alive improves performance in terms of requests per second a bit but i try not to use it for now Thanks for all the help, this is really interesting feedback that i got. -- Best, Maxime @ Criteo

