Hi, I am looking to get more performance out of a host running haproxy-1.5.12.
The host is running Ubuntu 12.04 (kernel 3.13.0-46-generic) with haproxy binaries from Vince Bernat's ppa (haproxy_1.5.12-1ppa1~precise_amd64.deb). The hardware is an HP DL360, with a 4 core Intel Xeon E5-2609 CPU @ 2.40GHz and 8GB RAM. Hatop shows roughly 7000 request/sec, and top shows Cpu0 : 20.9%us, 29.4%sy, 0.0%ni, 49.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu1 : 0.0%us, 0.0%sy, 0.0%ni, 84.4%id, 0.0%wa, 0.0%hi, 15.6%si, 0.0%st Cpu2 : 2.3%us, 1.3%sy, 0.0%ni, 85.8%id, 0.0%wa, 0.0%hi, 10.6%si, 0.0%st Cpu3 : 0.0%us, 0.0%sy, 0.0%ni, 87.1%id, 0.0%wa, 0.0%hi, 12.9%si, 0.0%st Mem: 8134648k total, 1411080k used, 6723568k free, 278336k buffers Swap: 8352252k total, 0k used, 8352252k free, 421120k cached HAProxy is a single process mapped to cpu0, NIC interupts are on cpu's 1-3. The http responses are small 0 - 2kB, as the biggest source of traffic is RTB traffic which generates a lot of small quick responses. Keep-alives are in use and "option http-keep-alive" is configured which has reduced the system load, as the cost of initiating a tcp sessions rapidly was having an impact on performance. I have tried enabling tcp splicing, I am not sure if it would be helpful in this case, however haproxy seems reluctant to use it even with "option splice-request" and "option splice-response" in the defaults section of the config. I guess this is possibly due to "[OPTIM] stream_sock: don't use splice on too small payloads"? Strace indicates the system time call is like this... % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 19.83 0.048752 1 36092 4709 recvfrom 18.02 0.044313 2 18898 8 sendto 10.79 0.026518 3 7701 7701 connect 10.20 0.025085 1 19320 epoll_ctl 10.07 0.024772 1 23321 setsockopt 8.55 0.021028 2 11419 140 accept4 8.34 0.020505 2 10174 close 5.98 0.014711 2 7701 socket 3.52 0.008648 2 4768 257 shutdown 3.30 0.008118 1 7701 fcntl 0.91 0.002246 1 1588 brk 0.47 0.001159 8 149 epoll_wait 0.01 0.000023 1 23 getsockopt ------ ----------- ----------- --------- --------- ---------------- 100.00 0.245878 148855 12815 total Happy to share a redacted config, are there any general recommendations for a workload like this? Do the number look sane? Regards, Rob

