Hi,

I am looking to get more performance out of a host running haproxy-1.5.12.

The host is running Ubuntu 12.04 (kernel 3.13.0-46-generic) with haproxy
binaries from Vince Bernat's ppa (haproxy_1.5.12-1ppa1~precise_amd64.deb).

The hardware is an HP DL360, with a 4 core Intel Xeon E5-2609 CPU @ 2.40GHz
and 8GB RAM.

Hatop shows roughly 7000 request/sec, and top shows

Cpu0  : 20.9%us, 29.4%sy,  0.0%ni, 49.7%id,  0.0%wa,  0.0%hi,  0.0%si,
0.0%st
Cpu1  :  0.0%us,  0.0%sy,  0.0%ni, 84.4%id,  0.0%wa,  0.0%hi, 15.6%si,
0.0%st
Cpu2  :  2.3%us,  1.3%sy,  0.0%ni, 85.8%id,  0.0%wa,  0.0%hi, 10.6%si,
0.0%st
Cpu3  :  0.0%us,  0.0%sy,  0.0%ni, 87.1%id,  0.0%wa,  0.0%hi, 12.9%si,
0.0%st
Mem:   8134648k total,  1411080k used,  6723568k free,   278336k buffers
Swap:  8352252k total,        0k used,  8352252k free,   421120k cached

HAProxy is a single process mapped to cpu0, NIC interupts are on cpu's 1-3.

The http responses are small 0 - 2kB, as the biggest source of traffic is
RTB traffic which generates a lot of small quick responses. Keep-alives are
in use and "option http-keep-alive" is configured which has reduced the
system load, as the cost of initiating a tcp sessions rapidly was having an
impact on performance.

I have tried enabling tcp splicing, I am not sure if it would be helpful in
this case, however haproxy seems reluctant to use it even with "option
splice-request" and "option splice-response" in the defaults section of the
config. I guess this is possibly due to "[OPTIM] stream_sock: don't use
splice on too small payloads"?

Strace indicates the system time call is like this...

% time     seconds  usecs/call     calls    errors
syscall
------ ----------- ----------- --------- ---------
----------------
 19.83    0.048752           1     36092      4709
recvfrom
 18.02    0.044313           2     18898         8
sendto
 10.79    0.026518           3      7701      7701
connect
 10.20    0.025085           1     19320
epoll_ctl
 10.07    0.024772           1     23321
setsockopt
  8.55    0.021028           2     11419       140
accept4
  8.34    0.020505           2     10174
close
  5.98    0.014711           2      7701
socket
  3.52    0.008648           2      4768       257
shutdown
  3.30    0.008118           1      7701
fcntl
  0.91    0.002246           1      1588
brk
  0.47    0.001159           8       149
epoll_wait
  0.01    0.000023           1        23
getsockopt
------ ----------- ----------- --------- ---------
----------------
100.00    0.245878                148855     12815 total

Happy to share a redacted config, are there any general recommendations for
a workload like this? Do the number look sane?

Regards,

Rob

Reply via email to