ODP: tcp resets on reload haproxy

Mostowiec Dominik Sun, 01 Apr 2012 04:44:51 -0700

Hi,

>>     maxconn 163937
> What's the reason for this magic number ?
It's random :-)


> Did you notice that your request packet (the 4th) was lost on the network ?
> I guess you captured on the siege_host

I captured this on loadbalancer host :-( It's not network loses.

> you did not have -vv nor -S so some info are missing
I recorded this to a file, with -vv:

11:20:58.713922 IP (tos 0x0, ttl 64, id 7370, offset 0, flags [DF], proto TCP 
(6), length 48)
    siege_host.46589 > loadbalancer.8123: Flags [S], cksum 0xe536 (correct), 
seq 1849604553, win 14600, options [mss 1460,nop,wscale 4], length 0
11:20:58.713951 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), 
length 48)
    loadbalancer.8123 > siege_host.46589: Flags [S.], cksum 0x683e (incorrect 
-> 0x7e18), seq 121266129, ack 1849604554, win 14600, options [mss 
1460,nop,wscale 6], length 0
11:20:58.714687 IP (tos 0x0, ttl 64, id 7371, offset 0, flags [DF], proto TCP 
(6), length 40)
    siege_host.46589 > loadbalancer.8123: Flags [.], cksum 0xdf59 (correct), 
seq 1, ack 1, win 913, length 0
11:20:58.714894 IP (tos 0x0, ttl 64, id 7372, offset 0, flags [DF], proto TCP 
(6), length 190)
    siege_host.46589 > loadbalancer.8123: Flags [P.], cksum 0x11eb (correct), 
seq 1:151, ack 1, win 913, length 150
11:21:00.717226 IP (tos 0x0, ttl 64, id 7373, offset 0, flags [DF], proto TCP 
(6), length 40)
    siege_host.46589 > loadbalancer.8123: Flags [F.], cksum 0xdec2 (correct), 
seq 151, ack 1, win 913, length 0
11:21:00.717254 IP (tos 0x0, ttl 64, id 17608, offset 0, flags [DF], proto TCP 
(6), length 40)
    loadbalancer.8123 > siege_host.46589: Flags [.], cksum 0x6836 (incorrect -> 
0xe205), seq 1, ack 1, win 229, length 0
11:21:01.723109 IP (tos 0x0, ttl 64, id 7374, offset 0, flags [DF], proto TCP 
(6), length 190)
    siege_host.46589 > loadbalancer.8123: Flags [P.], cksum 0x11eb (correct), 
seq 1:151, ack 1, win 913, length 150
11:21:01.723135 IP (tos 0x0, ttl 64, id 17609, offset 0, flags [DF], proto TCP 
(6), length 40)
    loadbalancer.8123 > siege_host.46589: Flags [.], cksum 0x6836 (incorrect -> 
0xe15e), seq 1, ack 152, win 245, length 0
11:21:01.724902 IP (tos 0x0, ttl 64, id 17610, offset 0, flags [DF], proto TCP 
(6), length 1500)
    loadbalancer.8123 > siege_host.46589: Flags [.], cksum 0x6dea (incorrect -> 
0xbbac), seq 1:1461, ack 152, win 245, length 1460
11:21:01.724929 IP (tos 0x0, ttl 64, id 17611, offset 0, flags [DF], proto TCP 
(6), length 1500)
    loadbalancer.8123 > siege_host.46589: Flags [.], cksum 0x6dea (incorrect -> 
0x2b9c), seq 1461:2921, ack 152, win 245, length 1460
11:21:01.724936 IP (tos 0x0, ttl 64, id 17612, offset 0, flags [DF], proto TCP 
(6), length 1500)
    loadbalancer.8123 > siege_host.46589: Flags [.], cksum 0x6dea (incorrect -> 
0x2b79), seq 2921:4381, ack 152, win 245, length 1460
11:21:01.724942 IP (tos 0x0, ttl 64, id 17613, offset 0, flags [DF], proto TCP 
(6), length 1500)
    loadbalancer.8123 > siege_host.46589: Flags [.], cksum 0x6dea (incorrect -> 
0x1e97), seq 4381:5841, ack 152, win 245, length 1460
11:21:01.724948 IP (tos 0x0, ttl 64, id 17614, offset 0, flags [DF], proto TCP 
(6), length 1500)
    loadbalancer.8123 > siege_host.46589: Flags [.], cksum 0x6dea (incorrect -> 
0xd630), seq 5841:7301, ack 152, win 245, length 1460
11:21:01.724952 IP (tos 0x0, ttl 64, id 17615, offset 0, flags [DF], proto TCP 
(6), length 389)
    loadbalancer.8123 > siege_host.46589: Flags [P.], cksum 0x6993 (incorrect 
-> 0xec72), seq 7301:7650, ack 152, win 245, length 349
11:21:01.724981 IP (tos 0x0, ttl 64, id 17616, offset 0, flags [DF], proto TCP 
(6), length 1500)
    loadbalancer.8123 > siege_host.46589: Flags [.], cksum 0x6dea (incorrect -> 
0x3378), seq 7650:9110, ack 152, win 245, length 1460
11:21:01.725003 IP (tos 0x0, ttl 64, id 17617, offset 0, flags [DF], proto TCP 
(6), length 1500)
    loadbalancer.8123 > siege_host.46589: Flags [.], cksum 0x6dea (incorrect -> 
0x9de5), seq 9110:10570, ack 152, win 245, length 1460
11:21:01.725012 IP (tos 0x0, ttl 64, id 17618, offset 0, flags [DF], proto TCP 
(6), length 1500)
    loadbalancer.8123 > siege_host.46589: Flags [.], cksum 0x6dea (incorrect -> 
0x6aad), seq 10570:12030, ack 152, win 245, length 1460
11:21:01.725020 IP (tos 0x0, ttl 64, id 17619, offset 0, flags [DF], proto TCP 
(6), length 1500)
    loadbalancer.8123 > siege_host.46589: Flags [P.], cksum 0x6dea (incorrect 
-> 0x534b), seq 12030:13490, ack 152, win 245, length 1460
11:21:01.725278 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), 
length 40)
    siege_host.46589 > loadbalancer.8123: Flags [R], cksum 0x496c (correct), 
seq 1849604705, win 0, length 0
11:21:01.725295 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), 
length 40)
    siege_host.46589 > loadbalancer.8123: Flags [R], cksum 0x496c (correct), 
seq 1849604705, win 0, length 0
11:21:01.725302 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), 
length 40)
    siege_host.46589 > loadbalancer.8123: Flags [R], cksum 0x496c (correct), 
seq 1849604705, win 0, length 0

Request are retransmitted:

---
010.177.050.158.46589-010.177.032.028.08123: GET / HTTP/1.1
Host: clouddev.onet:8123
Accept: */*
Accept-Encoding: gzip
User-Agent: JoeDog/1.00 [en] (X11; I; Siege 2.70)
Connection: close


010.177.050.158.46589-010.177.032.028.08123: GET / HTTP/1.1
Host: clouddev.onet:8123
Accept: */*
Accept-Encoding: gzip
User-Agent: JoeDog/1.00 [en] (X11; I; Siege 2.70)
Connection: close

....
---


> Wow 16 procs ! I don't know what you intend to do, but it will generally
> not bring anything and might even reduce the performance.

I have 2x6 core server (24 core in ht).

-
Regards
Dominik

________________________________________
Od: Willy Tarreau [[email protected]]
Wysłano: 31 marca 2012 18:58
Do: Mostowiec Dominik
DW: [email protected]
Temat: Re: tcp resets on reload haproxy

Hi Dominik,

On Fri, Mar 30, 2012 at 03:52:20PM +0200, Mostowiec Dominik wrote:
> Hi,
> Thanks for the response.
>
> I have another problem:
>
> 11:20:58.713922 IP siege_host.46589 > loadbalancer.8123: Flags [S], seq 
> 1849604553, win 14600, options [mss 1460,nop,wscale 4], length 0
> 11:20:58.713951 IP loadbalancer.8123 > siege_host.46589: Flags [S.], seq 
> 121266129, ack 1849604554, win 14600, options [mss 1460,nop,wscale 6], length > 0
> 11:20:58.714687 IP siege_host.46589 > loadbalancer.8123: Flags [.], ack 1, 
> win 913, length 0
> 11:20:58.714894 IP siege_host.46589 > loadbalancer.8123: Flags [P.], seq 
> 1:151, ack 1, win 913, length 150
> 11:21:00.717226 IP siege_host.46589 > loadbalancer.8123: Flags [F.], seq 151, 
> ack 1, win 913, length 0
> 11:21:00.717254 IP loadbalancer.8123 > siege_host.46589: Flags [.], ack 1, 
> win 229, length 0

Did you notice that your request packet (the 4th) was lost on the network ?

That's one reason why we always want to set timeouts above 3 sec (generally
4 or 5), so that it covers one TCP retransmit. I guess you captured on the
siege_host (you did not have -vv nor -S so some info are missing) ? Also,
you shoul be careful with the system config on siege_host, as it does not
have SACK enabled, which makes things worse when your network is lossy.

This packet loss issue is the reason for the pause you observe since the
request never reaches haproxy. If you increase your siege timeout above 3s
you'll see that many requests take 3s to be processed due to the retransmit
and that other ones still fail. You really need to find what is causing
these losses and to fix that, it's impossible to run a benchmark on a lossy
network! Check your switches and your NICs. Ensure you're not running with
an old bnx2 NIC with an old firmware.

BTW I have a few comments about your config :

> global
>     maxconn 163937

What's the reason for this magic number ?

>     user haproxy
>     group haproxy
>     daemon
>     nbproc 16

Wow 16 procs ! I don't know what you intend to do, but it will generally
not bring anything and might even reduce the performance.

> defaults
>     log global
>     mode        http
>     option      httplog
>     option      dontlognull
>     option      forwardfor
>     retries     1
>     contimeout  1s

 < 3s timeout, see above

>     clitimeout  33s
>     srvtimeout  33s
>     grace 7s

grace serves no purpose these days, especially if all instances
share the same setting (the goal was to make some instances stop
before other ones to fail external health checks).

I see that you have no default maxconn, so your frontends will still
be limited by the default maxconn (2000).

(...)
> Haproxy is started with "-n 163937 -N 163937" options.

OK so -N sets it. Still strange value anyway.

> I attached stats for test when nbproc is set to '1'.

Hmmm the load was very low :

   691 MB/20k conn = 34kB per connection
   At peak you reached 34kB*850 sess/s = 29 MB/s ~= 250 Mbps

It's very concerning that you're experiencing network losses at this
rate. Just a hint, it's more likely that the losses are located on
the siege host or between it and the network than on the haproxy
host, because when you run haproxy on a lossy machine you generally
observe failed health checks, which you didn't have here during the
test.

> Somthing is wrong with my configuration ?

Not particularly, let aside the strange numbers.

Regards,
Willy

ODP: tcp resets on reload haproxy

Reply via email to