Re: Help determining where the bottleneck is

Willy Tarreau Sun, 29 Jan 2012 05:27:23 -0800

Hi Steve,

On Tue, Jan 24, 2012 at 08:55:15AM -0800, Steve V wrote:
> Good morning,
> 
> Much love for haproxy and many thanks to all who have worked on and
> contributed to it.  We have been using it for several years without issue.
> However, we have been doing load testing lately and there appears to be a
> bottleneck.  It may not even have to do with haproxy (i dont think it does)
> but i need to double check anyways just to be thorough and cover all our
> bases.
> 
> Hardware: VM running on ESXi, it has 2gigs RAM allocated to it, and 2 CPU's
> GuestOS: CentOS 5
> Haproxy version: 1.4.8 (however, we just upgraded to 1.4.19 last night)
> 
> Problem: "second_proxy" is getting hammered by a load test, site
> performance decreases to the point where the site is barely usable and the
> majority of pages time out.  however, go to a different site that is in the
> same haproxy config listening on "http_proxy" going to the same backend
> server, and the site comes up fine and fast.  it seems like something is
> being throttled or queued somewhere.  its possible that it could be an
> issue behind haproxy on the app servers, but i just want to make sure there
> is nothing i need to tweak in my config.
> 
> Here is a snapshot of the haproxy stats page for the slow pool
> "second_proxy" http://tinypic.com/r/15887qf/5


Did you tune any sysctl on your system ?
Your snapshot reports a peak of 1600 conns/second, but the default kernel
settings (somaxconn 128 and tcp_max_syn_backlog 1024) make this hard to
reach, so it's very possible that the socket queue is simply full. I'm
used to set both between 10000 and 20000 with good success.

There is something you can try to detect if haproxy still accepts connections
fine : simply try to connect to the stats URL on the unresponding port. If the
stats display properly, then you're stuck on the servers. If the stats do not
respond either, then the connection is not accepted.

Be careful, you have no "maxconn" setting in the "defaults" section, and by
default a listen uses 2000. I'm seeing that your snapshot indicates that this
limit was not reached, still I wanted to let you know it's going to be the
next issue once this one is resolved.

> here is my haproxy.cfg
> 
> global
>         maxconn     8096
>         daemon
>         nbproc      1
>         stats socket /var/run/haproxy.stat
> 
> defaults
>         clitimeout  600000
>         srvtimeout  600000

Do you realize that this is 10 minutes (we're speaking HTTP here) ?

Regards,
Willy

Re: Help determining where the bottleneck is

Reply via email to