Hi!

2011/7/18 Willy Tarreau <w...@1wt.eu>

> Hi,
>
> On Mon, Jul 18, 2011 at 06:42:54AM +0400, Dmitriy Samsonov wrote:
> > My test setup is three Dell r410 servers (dual Intel(R) Xeon(R) CPU X5650
>  @
> > 2.67GHz - 24 threads total, 128Gb RAM) all connected to 1Gbps network.
> >
> > One server is haproxy, configured to block all requests with
> > 'Accept-Encoding: none':
> >
> > global
> >  daemon
> > maxconn 80000
> > option forwardfor
> >  retries 10
>
> Something is missing here above, probably "defaults" and maybe a few other
> options before "option forwardfor", such as "mode http" and even a few
> timeouts.
>

Yes, section defaults is the follwing:
defaults
        mode http
        maxconn 79500
        timeout client 20s
        timeout server 15s
        timeout queue  60s
        timeout connect 4s
        timeout http-request 5s
        timeout http-keep-alive 250
        #option httpclose
        option abortonclose
        balance roundrobin
        option forwardfor
        retries 10

Also there was conntrack module loaded - friend of mine was playing with
iptables and did not remove it. Now there is no iptables at all:

dex9 ipv4 # sysctl -a | grep conntrack | wc -l
0
dex9 ipv4 # lsmod | grep xt_ | wc -l
0
dex9 ipv4 # lsmod | grep nf_ | wc -l
0

I followed your recommendation and set affinity for processes:
dex9 ipv4 # schedtool 16638 # haproxy's process
PID 16638: PRIO   0, POLICY N: SCHED_NORMAL, NICE   0, AFFINITY 0x2

dex9 ipv4 # schedtool 3  # ksoftirqd/0
PID     3: PRIO   0, POLICY N: SCHED_NORMAL, NICE   0, AFFINITY 0x1

Now in top it looks like this:
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND

16638 root      20   0  100m  36m  520 R   93  0.0  19:33.18 haproxy

    3 root      20   0     0    0    0 R   38  0.0   7:11.76 ksoftirqd/0


93% percent for haproxy and 38% for ksoftirqd/0

I was lucky enough to reach 66903 session rate (max) and average value for
Cur is around 40-42k.

Typical output of one of two ab2 running is:

Server Software:
Server Hostname:        nohost
Server Port:            80

Document Path:          /
Document Length:        0 bytes

Concurrency Level:      1000
Time taken for tests:   470.484 seconds
Complete requests:      10000000
Failed requests:        0
Write errors:           0
Total transferred:      0 bytes
HTML transferred:       0 bytes
Requests per second:    21254.72 [#/sec] (mean)
Time per request:       47.048 [ms] (mean)
Time per request:       0.047 [ms] (mean, across all concurrent requests)
Transfer rate:          0.00 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0   34 275.9     11   21086
Processing:     0   13  17.8     11     784
Waiting:        0    0   0.0      0       0
Total:          2   47 276.9     22   21305

Percentage of the requests served within a certain time (ms)
  50%     22
  66%     26
  75%     28
  80%     30
  90%     37
  95%     41
  98%     47
  99%    266
 100%  21305 (longest request)

Typical output of vmstat is:
dex9 ipv4 # vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system--
----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id
wa
 1  0      0 131771328  46260  64016    0    0     2     1  865  503  1  6
94  0
 1  0      0 131770688  46260  64024    0    0     0     0 40496 6323  1  9
90  0
 1  0      0 131771920  46260  64024    0    0     0     0 47046 7212  1  8
91  0
 2  0      0 131771704  46260  64024    0    0     0     0 40864 6143  1  8
91  0
 1  0      0 131771688  46268  64024    0    0     0    12 36547 5268  1  8
91  0
 2  0      0 131771696  46268  64032    0    0     0     0 53189 8979  0  5
95  0
 3  0      0 131771584  46268  64024    0    0     0     0 31633 4025  0  4
96  0
 2  0      0 131771840  46268  64020    0    0     0     0 49723 9290  1  9
91  0
 2  0      0 131772448  46268  64028    0    0     0     0 44484 7008  1  8
91  0
 2  0      0 131771688  46276  64032    0    0     0    20 40132 4531  1  8
92  0
 2  0      0 131771456  46276  64028    0    0     0     0 36006 4445  0  8
91  0
 2  0      0 131772208  46276  64024    0    0     0     0 41325 5902  1  8
91  0
 2  0      0 131771208  46276  64032    0    0     0     0 44262 7427  1  5
94  0
 1  0      0 131771456  46276  64028    0    0     0     0 42403 5422  1  8
91  0
 2  0      0 131771944  46284  64020    0    0     0    12 46907 7419  1  7
93  0
 1  0      0 131772960  46284  64020    0    0     0     0 42772 6663  1  8
91  0
 2  0      0 131772832  46284  64024    0    0     0     0 45298 6695  1  8
91  0
 2  0      0 131772712  46284  64028    0    0     0     0 44604 5361  0  6
94  0
 1  0      0 131772464  46284  64028    0    0     0     0 39798 5105  0  4
96  0

Also, I've checked version of NIC's firmware:
dex9 ipv4 # ethtool -i eth0
driver: bnx2
version: 2.0.21
firmware-version: 6.2.12 bc 5.2.3
bus-info: 0000:01:00.0


Moreover, I've tried launching two ab2 localy:
dex9 ipv4 # ab2 -c 1000 -H 'Accept-Encoding: None' -n 10000000
http://localweb/
This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking inbet.cc (be patient)
Completed 1000000 requests
Completed 2000000 requests
^C

Server Software:
Server Hostname:        localweb
Server Port:            80

Document Path:          /
Document Length:        0 bytes

Concurrency Level:      1000
Time taken for tests:   104.583 seconds
Complete requests:      2141673
Failed requests:        0
Write errors:           0
Total transferred:      0 bytes
HTML transferred:       0 bytes
Requests per second:    20478.13 [#/sec] (mean)
Time per request:       48.833 [ms] (mean)
Time per request:       0.049 [ms] (mean, across all concurrent requests)
Transfer rate:          0.00 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0   38 352.7      6   21073
Processing:     0   10  49.7      7   14919
Waiting:        0    0   0.0      0       0
Total:          1   48 365.8     13   21078

Percentage of the requests served within a certain time (ms)
  50%     13
  66%     19
  75%     26
  80%     35
  90%     36
  95%     37
  98%     39
  99%     67
 100%  21078 (longest request)

Two such ab2 processes are running both at 100% and saturating haproxy to
100%. 'Cur' session rate is also around 40-44k/s.

Should I get rid of dell r410 and replace it with Core i5?:)) Being serious,
is there any other tips or tricks I can try? To see those amazing 100k/s
session rate?



> > frontend public
> >         bind 192.168.0.1:80
> >         default_backend nginx
> >         acl accepts_none hdr(Accept-Encoding) -i none
> >         errorfile 403 /raid/emptypage.txt
> >         block if accepts_none
> >
> > backend nginx
> > server srv 127.0.0.1:80 maxconn 8192
> >
> > File /raid/emptypage.txt is an empty file made with 'touch
> > /raid/emptypage.txt'.
> >
> > I'm doing ab2 -c 1000 -H 'Accept-Encoding: None' -n 1000000
> > http://192.168.0.1/ on two other servers and get following:
> >
> > When nbproc = 1 haproxy saturates 100% of cpu core it runs at, but server
> is
> > running nice, I'm able to get reply from nginx behind by using curl on my
> > machine: curl http://192.168.0.1/, ab reports 16833 requests/second each
> and
> > longest request is around 14seconds.
>
> 16k req/s is quite low for such a server, especially since no request
> should
> be forwarded. You should check how the user/system CPU usage is spread, I
> suspect there could be an important part spent in the system. Such a test
> should better report numbers in the 100k range.
>
> Also, 14 seconds to respond is simply unacceptable. Something is not
> working at all. 14 milliseconds would even be high for this type of
> processing but could be explained by packet queuing and interrupt
> rate depending on the NIC tuning.
>
> > When I change nbproc to higher values (maximum is 24 as there 24 threads
> > total) I can see ksoftirq/0 process saturating cpu core, network becomes
> > slow on server, ab reports same 16k-17k requests/second for each client,
> but
> > longest request is always around 20-30 seconds.
>
> At such loads, you should not see ksoftirq trigger. 16kreq/s, assuming
> there is no keep-alive will basically result in 7 packets per request,
> (4 in, 3 out) so you're running around 112 kpps, or 64k IN and 48k out.
>
> Could you check with "vmstat 1" your interrupt rate ? Wouldn't your NIC
> be a BNX2 with an old firmware such as 1.9.6 which is known for causing
> a high number of packet losses ? (check with ethtool -i eth0).
>
> > I've seen such things with ksoftirq/0 running at 100% and network is
> almost
> > down during DDoS attacks in case of too many iptables rules but what is
> > happening now?
>
> Indeed, if you have many iptables rules, then your machine will be very
> sensible to SYN floods because for each SYN it will be necessary to find
> the matching rule. Do you have ip_conntrack / nf_conntrack loaded ? I
> assume that it's the case. Did you tune it ? Default settings are fine
> for a desktop usage but not for a server. It is possible that all the
> time is wasted in the conntrack hash table lookup.
>
> > And what number to use at nbproc? Is it ok to have haproxy
> > running at 100%?
>
> It can be under extreme situations (eg: ddos) but it should not be the
> norm, otherwise your scaling abilities are seriously limited. Nbproc
> should normally be 1. With proper tuning this will give you much more
> raw performance than what you can get in multi-process and poor tuning.
> To give you an idea, at Exceliance, we can get 300kconn/s out of a single
> core of a dual-core i5 3 GHz. You're at 16k.
>
> What is very important too is to bind network interrupts and processes
> to different core sharing the same L2 cache. If you force all your NIC
> interrupts to core 0 and bind haproxy to core 1, chances are that they
> will be on the same cache (check in /proc/cpuinfo). This is how I get
> the most performance, with IRQs taking between 50 and 80% of one core,
> and haproxy 100% of another core.
>
> > It looks like I can have 30k requests per second in my
> > setup, is there any way to make it higher? I've done some basic tuning
> like
> > tcp_max_tw_buckets = 1024*1024, tcp_tw_reuse = 1, tcp_max_syn_backlog =
> > 30000. Am I running out of options?
>
> There are a number of other options, such as net.core.somaxconn which
> also limits the backlog (128 by default!). But in your case, the apparent
> lack of NIC/process binding can be one cause, and if iptables is there,
> it surely is another one.
>
> It is easy to verify if iptables is likely involved : for this you'll
> enable keep-alive. Add this to your haproxy config :
>
>     option http-server-close
>
> And replace :
>
>     block if accepts_none
>
> with :
>
>     redirect location / code 302 if accepts_none
>
>
> Then use "ab -k" instead of "ab" to run your tests. It will try to maintain
> connections open as long as possible. The only difference is that it will
> see status 302 instead of 403. But that's not a problem for a test.
>
> If you see that the numbers increase by a factor of approximately two, it
> will probably mean that you're bound by the packet count and/or by the CPU.
> If your numbers increase a lot more (eg: 5-10 times) then it will be very
> likely due to the fact that you don't stress conntrack anymore and don't
> spend your CPU cycles doing connection lookups.
>
> I really encourage you to try this as it's a simple test. Hint: on my
> 6 y-o laptop, I'm already getting your numbers above, and testing with
> keep-alive doubles the performance.
>
> Regards,
> Willy
>
>

Reply via email to