Thanks to Sasha, Baptiste and Willy for helping me

On Thu, Jul 3, 2014 at 11:13 PM, Willy Tarreau <[email protected]> wrote:
>
>> The most interesting part:
>> * we did the trick to set the smp_affinity of our eth0 interface to
>> cpu 0 and haproxy on cpu 1 with taskset BUT the soft interrupt CPU
>> stays on cpu 1 (with the haproxy). This is not what is documented from
>> the linux kernel, we dug into the RPS and RFS network features but
>> they are not activated in Centos 6 by default so they should not
>> interfere.
>
> I'm pretty sure that CPU 1 is the first thread of the first core of
> the second socket. So you're in the absolute worst situation where
> all the traffic has to transit through memory, and cache lines are
> doing ping-pong between the two sockets. The best thing to do is to
> totally stop the second socket for now.
>
> Please just verify with this :
>
>    grep '' /sys/devices/system/cpu/cpu*topology/phy*
>
> Second, disable hyperthreading to ensure you're not running haproxy
> on one core and the network on the other thread of the same core. You'll
> be able to re-enable it once you figure what the problem is, but there's
> no reason for wasting time with these parasits for now.

Actually i checked and the first 6 cores are the first threads of the
first socket, then the 6 next are for the second socket then the next
12 cores are the hyperthreads.
So all this should not be an issue although you are right i should
deactivate hyperthreading anyway to simplify tests.

>> I think that we have been looking way too deep in the problem and the
>> solution must be right in front of us.
>>
>> Does anyone have ideas?
>
> Could you check your network card's traffic (ideally on the switch) in
> terms of bit rate and packet rate in each direction ? At 15khps it depends
> a lot on the object size, especially when running on gigabit NICs which
> are easily overload

OK so i ran new tests with a separate nginx server (using multipll e
workers to handle the load).
The bottleneck seem to be clearly on the network stack, especially the
number of packets per second.

What i did:
* got back to the standard igb shipped in the kernel (it gets the 8
virtual channels by default for the NIC)
* went back to 4 siege clients trying both with and without
keep-alive, using 800 concurrency per siege
* removes all the smp_affinity of any IRQ (actually by default
everything seem to go to cpu0)
* pinned the haproxy process to cpu1 using cpu-map config
* set ethtool -C eth0 rx-usecs 500 on both nginx and haproxy hosts
(it's 3 by default)
* deactivate haproxy logging
* deactivated splice
* activated tcp-smart-connect and http-keep-alive

All in all it got me to around 25k request/sec without keep alive and
39k with keep alive.

It also solves the soft-interrupt CPU i was seeing on cpu1.

I couldn't check on the switch but used iptraf to get some stats:

* with very small packets we get at roughly 200kpacket/sec (100k RX
100k TX) at about 50Mbps
* with larger response (a default index.html file for nginx on CentOS
EPEL) we almost max out the 1G NIC at around ~800Mbps

So now the blocking part is not IRQs anymore (they run at worst at 60%
of cpu0, now i get at 100% on cpu1 with haproxy and latency spikes to
100ms (instead of 3ms without load). I am still testing with small
packets. The goal is to max out the session per sec not the bandwidth.

I can still improve the IRQ part using the 5 cores i have left on the
same socket since i have 8 virtual channels i can divide by 4 the
number of interrupt per core. However now the bottleneck is the
haproxy process at 100% (mostly system).

I still think that getting 25krps without keep-alive a very low.

For Baptiste questions:
- what is the exact reference of your CPU ?
- what is the frequency of your CPU ?

from DMI (The server is a HP Gen8 DL360e):
        Version:  Intel(R) Xeon(R) CPU E5-2430L 0 @ 2.00GHz
        Voltage: 1.4 V
        External Clock: 100 MHz
        Max Speed: 4800 MHz
        Current Speed: 2000 MHz
        Status: Populated, Enabled
        Upgrade: Socket LGA1356


- what is the command line you run on the "client" side (siege)
siege -b -c 800 -t30S http://my-lb/ (i use .siegerc for keep-alive too)

- have you disabled irq-balance ?
no

- what type of network interface are you using? (and which driver)
igb kernel driver version:        5.0.5-k

- are you benchmarking in keep-alive mode or not?
i was not, keep-alive improves performance in terms of requests per
second a bit but i try not to use it for now

Thanks for all the help, this is really interesting feedback that i got.

-- 
Best,
Maxime @ Criteo

Reply via email to