Re: Upgrade from 1.7 to 2.0 = increased CPU usage

Willy Tarreau Wed, 24 Jul 2019 20:28:56 -0700

On Thu, Jul 25, 2019 at 02:36:49AM +0200, Elias Abacioglu wrote:
> Hi Willy,
> 
> This would explain the 503s
> ```
>   # change a 503 response into a 204(a friendly decline).
>   errorfile 503 /etc/haproxy/errors/204.http
> 
>   acl is_disable path_beg /getuid/rogue-ad-exchange
>   # http-request deny defaults to 403, change it to a 503,
>   # which is a masked 204 since haproxy doesn't have a 204 errorfile.
>   http-request deny deny_status 503 if is_disable
> ```
> also
> ```
> backend robotstxt
>   errorfile 503 /etc/haproxy/errors/200.robots.http
> backend crossdomainxml
>   errorfile 503 /etc/haproxy/errors/200.crossdomain.http
> backend emptygif
>   errorfile 503 /etc/haproxy/errors/200.emptygif.http
> ```
> Basically I use 503 if I want to block a sender in a friendly way(i.e
> making them believe we just declined the transaction) and to host 3 tiny
> files, robots.txt, crossdomain.xml and empty.gif.


But I'm pretty sure I've seen 503s *received* by haproxy, indicating
that the next component sent them, so this cannot be the ones you
produce by your configuration.

> It felt excessive to setup redundant webservers for a total of 703 bytes of
> files and also it felt wasteful to have it in the java backend. So I
> cheated haproxys errorfiles.

Oh don't worry you're not the only one to do that :-)  I've even seen
an auto-generated config using one backend per file and an error file
matching the contents of each file of a directory, to replace a web
server!

> So I don't think that the 503 causes retries for our clients, it's just me
> abusing haproxy.

I'm really speaking about 503 being received by haproxy and delivered
as 503 to the clients, not about 503s in the logs that in fact were
rewritten differently. Look here :

10:51:13.776098 recvfrom(44797, "HTTP/1.1 503 Service Unavailable"..., 16320, 0,
 NULL, NULL) = 55
10:51:13.776184 recvfrom(19524, "HTTP/1.1 503 Service Unavailable"..., 16320, 0,
 NULL, NULL) = 55
10:51:13.776272 recvfrom(57869, "HTTP/1.1 503 Service Unavailable"..., 16320, 0,
 NULL, NULL) = 55
10:51:13.776391 recvfrom(35693, "HTTP/1.1 503 Service Unavailable"..., 16320, 0,
 NULL, NULL) = 55
10:51:13.776613 recvfrom(8041, "HTTP/1.1 503 Service Unavailable"..., 16320, 0, 
NULL, NULL) = 55

then:

10:51:13.844586 sendto(61292, "HTTP/1.1 503 Service Unavailable"..., 112, MSG_DO
NTWAIT|MSG_NOSIGNAL, NULL, 0) = 112
10:51:13.844617 sendto(62213, "HTTP/1.1 503 Service Unavailable"..., 112, MSG_DO
NTWAIT|MSG_NOSIGNAL, NULL, 0) = 112
10:51:13.844646 sendto(62685, "HTTP/1.1 503 Service Unavailable"..., 112, 
MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = 112
10:51:13.844672 sendto(65490, "HTTP/1.1 503 Service Unavailable"..., 112, 
MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = 112

So this is why I was asking.

> We receive transactional requests, ad exchanges sending us requests.

OK so such services generally do not retry.

> Also real browsers connecting to us when cookie syncing.

OK.

> So the transactional we want to keep-alive so the clients sends multiple
> http requests per connection.

Of course.

> And the browser clients we want to close the connection to the client after
> it's request+response.
> So the browser clients backend have "option forceclose". Which would
> explain the short connections.

OK, makes sense.

> Currently we have "http-reuse safe" in the defaults section and "http-reuse
> never" in a tcp mode listener that forwards all :443 traffic to another set
> of haproxies that has more cores and does TLS termination. And this is to
> not mess upp the X-Forward-For headers.

There is no http-reuse in TCP mode, you probably even get a warning.

> I will try "http-reuse always" in the defaults, but not in the tcp mode
> listener as we rely on X-Forward-For.

It must have no other effect than emitting a warning for your TCP mode.
Additionally, the reuse is per request. So your XFF header will remain
valid since each request will emit its own XFF header. Reuse is only
about reusing a keep-alive connection.

> Even if I get better performance it still wouldn't answer why the HAProxy
> CPU usage (with same config) would increase with the same config in v1.7
> compared to v2.0.

That's why I was asking about whether or not the 503 can induce client
retries.

> Assuming that the "http-reuse always" might help performance in 2.0, it's
> not fair comparing a better performance tuned v2.0 vs a less tuned v1.7.

That's not my goal. I want to make sure we're not accumulating lots of
unused server-side connections in the server pools, which could in turn
make the servers sick and deliver 503s. With reuse safe this can definitely
happen, with reuse always it will not. In fact I'm really interested in
knowing if you still receive lots of 503 like this, and if you have that
many concurrent connections. In your trace I'm seeing file descriptors
as high as 84000 approximately, and if for any reason this is not normal
it could explain a difference. We could even imagine that there are
connect retries on the servers, which also increase the load.

If you can it could be useful to see an output of "perf top" run in
parallel with 1.7 an with 2.0. We may discover something totally
wrong in 2.0.

Willy

Re: Upgrade from 1.7 to 2.0 = increased CPU usage

Reply via email to