RE: HAProxy, multicores and EC2

Erik Torlen Mon, 10 Oct 2011 12:31:24 -0700

Hi,

I made some more tests, this time with taskset to see how the performance is 
affected.

I noticed during the tests that the connections against the backend (3 bcks) 
was not divided equal, it
was instead very different. The 1st was ~2200, 2nd ~150 and the 3rd like ~60. 
It also jumped a lot, on the 1st it could be 2000, down to 200, up to 3000 and 
so on.

I'm guessing that this could have to do with the backends being in different 
availability zones (which would be diff datacenters).
And therefore the network latency is causing a delay on the connection against 
the machine that has the longest route?
(the load on the backends was equal, around 60% cpu).

FYI, the haproxy is located in amazon east av. zone 1D. The three backends are 
in B,C and D. Looking at the stats from
HAproxy (attached) you can see that corr conns to backend in zone D is fairly 
low compared to the other zones where
zone B is worst and then zone C. zone B is in average 4-5 times worse compared 
to zone C. 

This is with nbproc=2 and using taskset to bind the both haproxy processes to 
cpu 2,3.
I managed to push ~7500 req/s.

top - 14:32:45 up 4 days, 19:44,  1 user,  load average: 0.79, 0.86, 0.59
Tasks:  82 total,   3 running,  79 sleeping,   0 stopped,   0 zombie
Cpu0  :  0.0%us,  0.0%sy,  0.0%ni, 93.2%id,  0.0%wa,  1.5%hi,  3.8%si,  1.5%st
Cpu1  :  0.0%us,  0.5%sy,  0.0%ni, 99.5%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu2  : 23.9%us, 41.3%sy,  0.0%ni, 16.5%id,  0.0%wa,  0.0%hi, 18.3%si,  0.0%st
Cpu3  : 21.3%us, 42.6%sy,  0.0%ni, 16.7%id,  0.0%wa,  0.0%hi, 19.4%si,  0.0%st
Mem:  15374136k total,  1172536k used, 14201600k free,    52024k buffers
Swap:        0k total,        0k used,        0k free,   242540k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 1862 haproxy   20   0  148m  68m  652 R 93.3  0.5   5:42.97 haproxy
 1861 haproxy   20   0  135m  62m  656 R 90.4  0.4   5:35.57 haproxy

Using only nbproc=2 without taskset gave me this. Look at %si, it is majority 
on cpu0.
Managed to push ~6500 req/s, less compared to using taskset.

top - 14:51:56 up 4 days, 20:03,  1 user,  load average: 1.76, 1.53, 0.98
Tasks:  82 total,   3 running,  79 sleeping,   0 stopped,   0 zombie
Cpu0  : 16.2%us, 21.2%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi, 62.6%si,  0.0%st
Cpu1  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu2  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu3  : 22.4%us, 39.3%sy,  0.0%ni, 15.9%id,  0.0%wa,  0.0%hi, 22.4%si,  0.0%st
Mem:  15374136k total,  1216348k used, 14157788k free,    52064k buffers
Swap:        0k total,        0k used,        0k free,   242556k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 1915 haproxy   20   0  181m  83m  656 R 99.3  0.6   8:19.49 haproxy
 1916 haproxy   20   0  193m  88m  652 R 90.4  0.6   8:16.89 haproxy

Using nbproc=3 och taskset= 1,2,3 gave worse results comparing to nbproc=2 and 
taskset=2,3.

I will make more tests with your suggestions (tcp-smart-connect + 
tcp-smart-accept).

/E

-----Original Message-----
From: Willy Tarreau [mailto:[email protected]] 
Sent: den 8 oktober 2011 23:09
To: Erik Torlen
Cc: [email protected]
Subject: Re: HAProxy, multicores and EC2

On Sun, Oct 09, 2011 at 02:24:27AM +0000, Erik Torlen wrote:
> Thanks for the response Willy
> 
> I agree of what you are saying.
> I have loadtested a lot of different machines/systems and the VMs never have 
> as good performance
> as a physical machine. However, in this case we have to use Amazon so it's 
> more focus to get the most
> out of 1 single instance and then scale with more machines to get more 
> performance.

I see.

> Xtra large EC2's are "supposed" to be dedicated machines in the cloud, no one 
> else should use them except
> for you. But if I can't get HAProxy to use the XL EC2 properly it could be 
> better to have more Large instances
> Instead (2 cores). That would reduce cost and make better use of the 
> instances.

I agree. What is important in EC2 is to reduce the number of packets as much
as possible, as we noticed in the past that every packet has a huge cost.
Using keep-alive with the client (option http-server-close) saves some
packets on the client side and allows haproxy to use TCP RST to close the
server connection and save another packet on this side. Using both
"option tcp-smart-accept" and "option tcp-smart-connect" saves another
packet on each side. You should notice an improvement with these.

> And make stud use one of the cores and HAProxy the other?

Yes, possibly. If you need to run a lot of SSL on the machine, then I
suggest that you keep your XL machine. Recently, stud merged the patches
provided by our dev team at Exceliance, allowing it to scale using
multiple processes. In your case, you should stick all interrupts to
code #0, haproxy to core #1 and stud to all remaining cores. That way
you should get optimal performance.

> I read a lot of people that have tried stud. This example is interesting in 
> this case because he assigns the
> different processes to different cores with cpuset: 
> http://vincent.bernat.im/en/blog/2011-ssl-benchmark.html
> 
> In my case, would cpuset be the same as taskset? 

Yes, that's the same. There is a number of tools for the same thing.
The principle is always the same : they reduce the list of processors
a task is allowed to run on.

Regards,
Willy

Title: Statistics Report for HAProxy

HAProxy version 1.5-dev7, released 2011/09/10

Statistics Report for pid 1962

> General process information

pid = 1962 (process #3, nbproc = 3)
uptime = 0d 0h04m13s
system limits: memmax = unlimited; ulimit-n = 200027
maxsock = 200027; maxconn = 100000; maxpipes = 0
current conns = 2019; current pipes = 0/0; conn rate = 93/sec
Running tasks: 2/2026; idle = 45 %

	active UP		backup UP
	active UP, going down		backup UP, going down
	active DOWN, going up		backup DOWN, going up
	active or backup DOWN		not checked
	active or backup DOWN for maintenance (MAINT)

Note: UP with load-balancing disabled is reported as "NOLB".

Display option:

External ressources:

main

	Queue			Session rate			Sessions					Bytes		Denied		Errors			Warnings		Server
	Cur	Max	Limit	Cur	Max	Limit	Cur	Max	Limit	Total	LbTot	In	Out	Req	Resp	Req	Conn	Resp	Retr	Redis	Status	LastChk	Wght	Act	Bck	Chk	Dwn	Dwntme	Thrtle
Frontend				92	131	-	2018	2020	100000	13050		132839048	679457079	0	0	0					OPEN

bh_frontends

	Queue			Session rate			Sessions					Bytes		Denied		Errors			Warnings		Server
	Cur	Max	Limit	Cur	Max	Limit	Cur	Max	Limit	Total	LbTot	In	Out	Req	Resp	Req	Conn	Resp	Retr	Redis	Status	LastChk	Wght	Act	Bck	Chk	Dwn	Dwntme	Thrtle
i-2895xxxx-east-1b	0	0	-	823	1255		1089	1830	-	120397	119600	44081104	225226042		0		0	74	810	0	4m13s UP	L7OK/200 in 430ms	1	Y	-	0	0	0s	-
i-a295xxxx-east-1c	0	0	-	823	1187		608	1025	-	119604	119600	44267236	226677239		0		0	0	4	0	4m13s UP	L7OK/200 in 44ms	1	Y	-	0	0	0s	-
i-c894xxxx-east-1d	0	0	-	822	1188		2	773	-	119599	119599	44490708	227553798		0		0	0	0	0	4m13s UP	L7OK/200 in 39ms	1	Y	-	0	0	0s	-
Backend	0	0		2468	3562		1712	1911	15000	358799	358799	132839048	679457079	0	0		0	74	814	0	4m13s UP		3	3	0		0	0s

stats

	Queue			Session rate			Sessions					Bytes		Denied		Errors			Warnings		Server
	Cur	Max	Limit	Cur	Max	Limit	Cur	Max	Limit	Total	LbTot	In	Out	Req	Resp	Req	Conn	Resp	Retr	Redis	Status	LastChk	Wght	Act	Bck	Chk	Dwn	Dwntme	Thrtle
Frontend				1	2	-	1	1	100000	25		13029	330036	0	0	0					OPEN
Backend	0	0		0	0		0	0	10000	0	0	13029	330036	0	0		0	0	0	0	4m13s UP		0	0	0		0

RE: HAProxy, multicores and EC2

HAProxy version 1.5-dev7, released 2011/09/10

Statistics Report for pid 1962

> General process information

Reply via email to