Re: Reducing HAProxy System Time

Willy Tarreau Mon, 18 May 2015 19:59:39 -0700

Hi Robert,

On Mon, May 18, 2015 at 05:17:29PM -0700, Robert Brooks wrote:
> Hi,
> 
> I am looking to get more performance out of a host running haproxy-1.5.12.
> 
> The host is running Ubuntu 12.04 (kernel 3.13.0-46-generic) with haproxy
> binaries from Vince Bernat's ppa (haproxy_1.5.12-1ppa1~precise_amd64.deb).
> 
> The hardware is an HP DL360, with a 4 core Intel Xeon E5-2609 CPU @ 2.40GHz
> and 8GB RAM.
> 
> Hatop shows roughly 7000 request/sec, and top shows
> 
> Cpu0  : 20.9%us, 29.4%sy,  0.0%ni, 49.7%id,  0.0%wa,  0.0%hi,  0.0%si,
> 0.0%st
> Cpu1  :  0.0%us,  0.0%sy,  0.0%ni, 84.4%id,  0.0%wa,  0.0%hi, 15.6%si,
> 0.0%st
> Cpu2  :  2.3%us,  1.3%sy,  0.0%ni, 85.8%id,  0.0%wa,  0.0%hi, 10.6%si,
> 0.0%st
> Cpu3  :  0.0%us,  0.0%sy,  0.0%ni, 87.1%id,  0.0%wa,  0.0%hi, 12.9%si,
> 0.0%st
> Mem:   8134648k total,  1411080k used,  6723568k free,   278336k buffers
> Swap:  8352252k total,        0k used,  8352252k free,   421120k cached
> 
> HAProxy is a single process mapped to cpu0, NIC interupts are on cpu's 1-3.


That's visible :-)

> The http responses are small 0 - 2kB, as the biggest source of traffic is
> RTB traffic which generates a lot of small quick responses. Keep-alives are
> in use and "option http-keep-alive" is configured which has reduced the
> system load, as the cost of initiating a tcp sessions rapidly was having an
> impact on performance.
> 
> I have tried enabling tcp splicing, I am not sure if it would be helpful in
> this case, however haproxy seems reluctant to use it even with "option
> splice-request" and "option splice-response" in the defaults section of the
> config. I guess this is possibly due to "[OPTIM] stream_sock: don't use
> splice on too small payloads"?

It's useless at such sizes. A rule of thumb is that splicing will not be
used at all for anything that completely fits in a buffer since haproxy
tries to read a whole response at once and needs to parse the HTTP headers
anyway. In general, the cost of the splice() system calls compared to
copying data sees a break-even around 4-16kB depending on arch, memory
speed, setup and many factors. I find splice very useful on moderately
large objects (>64kB) at high bit rates (10-60 Gbps). On most gigabit
NICs, it's inefficient as most such NICs have limited capabilities which
make them less efficient at performing multiple DMA accesses for a single
packet (eg: no hardware scatter-gather to enable GSO/TSO for example).

> Strace indicates the system time call is like this...
> 
> % time     seconds  usecs/call     calls    errors
> syscall
> ------ ----------- ----------- --------- ---------
> ----------------
>  19.83    0.048752           1     36092      4709
> recvfrom
>  18.02    0.044313           2     18898         8
> sendto
>  10.79    0.026518           3      7701      7701
> connect
>  10.20    0.025085           1     19320
> epoll_ctl
>  10.07    0.024772           1     23321
> setsockopt
>   8.55    0.021028           2     11419       140
> accept4
>   8.34    0.020505           2     10174
> close
>   5.98    0.014711           2      7701
> socket
>   3.52    0.008648           2      4768       257
> shutdown
>   3.30    0.008118           1      7701
> fcntl
>   0.91    0.002246           1      1588
> brk
>   0.47    0.001159           8       149
> epoll_wait
>   0.01    0.000023           1        23
> getsockopt
> ------ ----------- ----------- --------- ---------
> ----------------
> 100.00    0.245878                148855     12815 total
> 
> Happy to share a redacted config, are there any general recommendations for
> a workload like this? Do the number look sane?

Nothing apparently wrong here, though the load could look a bit high
for that traffic (a single-core Atom N270 can do that at 100% CPU).
You should run a benchmark to find your system's limits. The CPU load
in general is not linear since most operations will be aggregated when
the connection rate increases. Still you're running at approx 50% CPU
and a lot of softirq, so I suspect that conntrack is enabled on the
system with default settings and not tuned for performance, which may
explain why the numbers look a bit high. But nothing to worry about.

Willy

Re: Reducing HAProxy System Time

Reply via email to