Hi,
I made some more tests, this time with taskset to see how the performance is
affected.
I noticed during the tests that the connections against the backend (3 bcks)
was not divided equal, it
was instead very different. The 1st was ~2200, 2nd ~150 and the 3rd like ~60.
It also jumped a lot, on the 1st it could be 2000, down to 200, up to 3000 and
so on.
I'm guessing that this could have to do with the backends being in different
availability zones (which would be diff datacenters).
And therefore the network latency is causing a delay on the connection against
the machine that has the longest route?
(the load on the backends was equal, around 60% cpu).
FYI, the haproxy is located in amazon east av. zone 1D. The three backends are
in B,C and D. Looking at the stats from
HAproxy (attached) you can see that corr conns to backend in zone D is fairly
low compared to the other zones where
zone B is worst and then zone C. zone B is in average 4-5 times worse compared
to zone C.
This is with nbproc=2 and using taskset to bind the both haproxy processes to
cpu 2,3.
I managed to push ~7500 req/s.
top - 14:32:45 up 4 days, 19:44, 1 user, load average: 0.79, 0.86, 0.59
Tasks: 82 total, 3 running, 79 sleeping, 0 stopped, 0 zombie
Cpu0 : 0.0%us, 0.0%sy, 0.0%ni, 93.2%id, 0.0%wa, 1.5%hi, 3.8%si, 1.5%st
Cpu1 : 0.0%us, 0.5%sy, 0.0%ni, 99.5%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu2 : 23.9%us, 41.3%sy, 0.0%ni, 16.5%id, 0.0%wa, 0.0%hi, 18.3%si, 0.0%st
Cpu3 : 21.3%us, 42.6%sy, 0.0%ni, 16.7%id, 0.0%wa, 0.0%hi, 19.4%si, 0.0%st
Mem: 15374136k total, 1172536k used, 14201600k free, 52024k buffers
Swap: 0k total, 0k used, 0k free, 242540k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1862 haproxy 20 0 148m 68m 652 R 93.3 0.5 5:42.97 haproxy
1861 haproxy 20 0 135m 62m 656 R 90.4 0.4 5:35.57 haproxy
Using only nbproc=2 without taskset gave me this. Look at %si, it is majority
on cpu0.
Managed to push ~6500 req/s, less compared to using taskset.
top - 14:51:56 up 4 days, 20:03, 1 user, load average: 1.76, 1.53, 0.98
Tasks: 82 total, 3 running, 79 sleeping, 0 stopped, 0 zombie
Cpu0 : 16.2%us, 21.2%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 62.6%si, 0.0%st
Cpu1 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu2 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu3 : 22.4%us, 39.3%sy, 0.0%ni, 15.9%id, 0.0%wa, 0.0%hi, 22.4%si, 0.0%st
Mem: 15374136k total, 1216348k used, 14157788k free, 52064k buffers
Swap: 0k total, 0k used, 0k free, 242556k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1915 haproxy 20 0 181m 83m 656 R 99.3 0.6 8:19.49 haproxy
1916 haproxy 20 0 193m 88m 652 R 90.4 0.6 8:16.89 haproxy
Using nbproc=3 och taskset= 1,2,3 gave worse results comparing to nbproc=2 and
taskset=2,3.
I will make more tests with your suggestions (tcp-smart-connect +
tcp-smart-accept).
/E
-----Original Message-----
From: Willy Tarreau [mailto:[email protected]]
Sent: den 8 oktober 2011 23:09
To: Erik Torlen
Cc: [email protected]
Subject: Re: HAProxy, multicores and EC2
On Sun, Oct 09, 2011 at 02:24:27AM +0000, Erik Torlen wrote:
> Thanks for the response Willy
>
> I agree of what you are saying.
> I have loadtested a lot of different machines/systems and the VMs never have
> as good performance
> as a physical machine. However, in this case we have to use Amazon so it's
> more focus to get the most
> out of 1 single instance and then scale with more machines to get more
> performance.
I see.
> Xtra large EC2's are "supposed" to be dedicated machines in the cloud, no one
> else should use them except
> for you. But if I can't get HAProxy to use the XL EC2 properly it could be
> better to have more Large instances
> Instead (2 cores). That would reduce cost and make better use of the
> instances.
I agree. What is important in EC2 is to reduce the number of packets as much
as possible, as we noticed in the past that every packet has a huge cost.
Using keep-alive with the client (option http-server-close) saves some
packets on the client side and allows haproxy to use TCP RST to close the
server connection and save another packet on this side. Using both
"option tcp-smart-accept" and "option tcp-smart-connect" saves another
packet on each side. You should notice an improvement with these.
> And make stud use one of the cores and HAProxy the other?
Yes, possibly. If you need to run a lot of SSL on the machine, then I
suggest that you keep your XL machine. Recently, stud merged the patches
provided by our dev team at Exceliance, allowing it to scale using
multiple processes. In your case, you should stick all interrupts to
code #0, haproxy to core #1 and stud to all remaining cores. That way
you should get optimal performance.
> I read a lot of people that have tried stud. This example is interesting in
> this case because he assigns the
> different processes to different cores with cpuset:
> http://vincent.bernat.im/en/blog/2011-ssl-benchmark.html
>
> In my case, would cpuset be the same as taskset?
Yes, that's the same. There is a number of tools for the same thing.
The principle is always the same : they reduce the list of processors
a task is allowed to run on.
Regards,
Willy
Title: Statistics Report for HAProxy
Statistics Report for pid 1962
> General process information
|
pid = 1962 (process #3, nbproc = 3)
uptime = 0d 0h04m13s
system limits: memmax = unlimited; ulimit-n = 200027
maxsock = 200027; maxconn = 100000; maxpipes = 0
current conns = 2019; current pipes = 0/0; conn rate = 93/sec
Running tasks: 2/2026; idle = 45 %
|
| | active UP | | backup UP |
| active UP, going down | | backup UP, going down |
| active DOWN, going up | | backup DOWN, going up |
| active or backup DOWN | | not checked |
| active or backup DOWN for maintenance (MAINT) |
Note: UP with load-balancing disabled is reported as "NOLB". | Display option: | External ressources: |
| Queue | Session rate | Sessions | Bytes | Denied | Errors | Warnings | Server |
| Cur | Max | Limit | Cur | Max | Limit | Cur | Max | Limit | Total | LbTot | In | Out | Req | Resp | Req | Conn | Resp | Retr | Redis | Status | LastChk | Wght | Act | Bck | Chk | Dwn | Dwntme | Thrtle |
|---|
| Frontend | | 92 | 131 | - | 2018 | 2020 | 100000 | 13050 | | 132839048 | 679457079 | 0 | 0 | 0 | | | | | OPEN | |
| Queue | Session rate | Sessions | Bytes | Denied | Errors | Warnings | Server |
| Cur | Max | Limit | Cur | Max | Limit | Cur | Max | Limit | Total | LbTot | In | Out | Req | Resp | Req | Conn | Resp | Retr | Redis | Status | LastChk | Wght | Act | Bck | Chk | Dwn | Dwntme | Thrtle |
|---|
| i-2895xxxx-east-1b | 0 | 0 | - | 823 | 1255 | | 1089 | 1830 | - | 120397 | 119600 | 44081104 | 225226042 | | 0 | | 0 | 74 | 810 | 0 | 4m13s UP | L7OK/200 in 430ms | 1 | Y | - | 0 | 0 | 0s | - |
| i-a295xxxx-east-1c | 0 | 0 | - | 823 | 1187 | | 608 | 1025 | - | 119604 | 119600 | 44267236 | 226677239 | | 0 | | 0 | 0 | 4 | 0 | 4m13s UP | L7OK/200 in 44ms | 1 | Y | - | 0 | 0 | 0s | - |
| i-c894xxxx-east-1d | 0 | 0 | - | 822 | 1188 | | 2 | 773 | - | 119599 | 119599 | 44490708 | 227553798 | | 0 | | 0 | 0 | 0 | 0 | 4m13s UP | L7OK/200 in 39ms | 1 | Y | - | 0 | 0 | 0s | - |
| Backend | 0 | 0 | | 2468 | 3562 | | 1712 | 1911 | 15000 | 358799 | 358799 | 132839048 | 679457079 | 0 | 0 | | 0 | 74 | 814 | 0 | 4m13s UP | | 3 | 3 | 0 | | 0 | 0s | |
| Queue | Session rate | Sessions | Bytes | Denied | Errors | Warnings | Server |
| Cur | Max | Limit | Cur | Max | Limit | Cur | Max | Limit | Total | LbTot | In | Out | Req | Resp | Req | Conn | Resp | Retr | Redis | Status | LastChk | Wght | Act | Bck | Chk | Dwn | Dwntme | Thrtle |
|---|
| Frontend | | 1 | 2 | - | 1 | 1 | 100000 | 25 | | 13029 | 330036 | 0 | 0 | 0 | | | | | OPEN | |
| Backend | 0 | 0 | | 0 | 0 | | 0 | 0 | 10000 | 0 | 0 | 13029 | 330036 | 0 | 0 | | 0 | 0 | 0 | 0 | 4m13s UP | | 0 | 0 | 0 | | 0 | | |