Hi,

I made some more tests, this time with taskset to see how the performance is 
affected.

I noticed during the tests that the connections against the backend (3 bcks) 
was not divided equal, it
was instead very different. The 1st was ~2200, 2nd ~150 and the 3rd like ~60. 
It also jumped a lot, on the 1st it could be 2000, down to 200, up to 3000 and 
so on.

I'm guessing that this could have to do with the backends being in different 
availability zones (which would be diff datacenters).
And therefore the network latency is causing a delay on the connection against 
the machine that has the longest route?
(the load on the backends was equal, around 60% cpu).

FYI, the haproxy is located in amazon east av. zone 1D. The three backends are 
in B,C and D. Looking at the stats from
HAproxy (attached) you can see that corr conns to backend in zone D is fairly 
low compared to the other zones where
zone B is worst and then zone C. zone B is in average 4-5 times worse compared 
to zone C. 

This is with nbproc=2 and using taskset to bind the both haproxy processes to 
cpu 2,3.
I managed to push ~7500 req/s.

top - 14:32:45 up 4 days, 19:44,  1 user,  load average: 0.79, 0.86, 0.59
Tasks:  82 total,   3 running,  79 sleeping,   0 stopped,   0 zombie
Cpu0  :  0.0%us,  0.0%sy,  0.0%ni, 93.2%id,  0.0%wa,  1.5%hi,  3.8%si,  1.5%st
Cpu1  :  0.0%us,  0.5%sy,  0.0%ni, 99.5%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu2  : 23.9%us, 41.3%sy,  0.0%ni, 16.5%id,  0.0%wa,  0.0%hi, 18.3%si,  0.0%st
Cpu3  : 21.3%us, 42.6%sy,  0.0%ni, 16.7%id,  0.0%wa,  0.0%hi, 19.4%si,  0.0%st
Mem:  15374136k total,  1172536k used, 14201600k free,    52024k buffers
Swap:        0k total,        0k used,        0k free,   242540k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 1862 haproxy   20   0  148m  68m  652 R 93.3  0.5   5:42.97 haproxy
 1861 haproxy   20   0  135m  62m  656 R 90.4  0.4   5:35.57 haproxy


Using only nbproc=2 without taskset gave me this. Look at %si, it is majority 
on cpu0.
Managed to push ~6500 req/s, less compared to using taskset.

top - 14:51:56 up 4 days, 20:03,  1 user,  load average: 1.76, 1.53, 0.98
Tasks:  82 total,   3 running,  79 sleeping,   0 stopped,   0 zombie
Cpu0  : 16.2%us, 21.2%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi, 62.6%si,  0.0%st
Cpu1  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu2  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu3  : 22.4%us, 39.3%sy,  0.0%ni, 15.9%id,  0.0%wa,  0.0%hi, 22.4%si,  0.0%st
Mem:  15374136k total,  1216348k used, 14157788k free,    52064k buffers
Swap:        0k total,        0k used,        0k free,   242556k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 1915 haproxy   20   0  181m  83m  656 R 99.3  0.6   8:19.49 haproxy
 1916 haproxy   20   0  193m  88m  652 R 90.4  0.6   8:16.89 haproxy


Using nbproc=3 och taskset= 1,2,3 gave worse results comparing to nbproc=2 and 
taskset=2,3.

I will make more tests with your suggestions (tcp-smart-connect + 
tcp-smart-accept).

/E

-----Original Message-----
From: Willy Tarreau [mailto:[email protected]] 
Sent: den 8 oktober 2011 23:09
To: Erik Torlen
Cc: [email protected]
Subject: Re: HAProxy, multicores and EC2

On Sun, Oct 09, 2011 at 02:24:27AM +0000, Erik Torlen wrote:
> Thanks for the response Willy
> 
> I agree of what you are saying.
> I have loadtested a lot of different machines/systems and the VMs never have 
> as good performance
> as a physical machine. However, in this case we have to use Amazon so it's 
> more focus to get the most
> out of 1 single instance and then scale with more machines to get more 
> performance.

I see.

> Xtra large EC2's are "supposed" to be dedicated machines in the cloud, no one 
> else should use them except
> for you. But if I can't get HAProxy to use the XL EC2 properly it could be 
> better to have more Large instances
> Instead (2 cores). That would reduce cost and make better use of the 
> instances.

I agree. What is important in EC2 is to reduce the number of packets as much
as possible, as we noticed in the past that every packet has a huge cost.
Using keep-alive with the client (option http-server-close) saves some
packets on the client side and allows haproxy to use TCP RST to close the
server connection and save another packet on this side. Using both
"option tcp-smart-accept" and "option tcp-smart-connect" saves another
packet on each side. You should notice an improvement with these.

> And make stud use one of the cores and HAProxy the other?

Yes, possibly. If you need to run a lot of SSL on the machine, then I
suggest that you keep your XL machine. Recently, stud merged the patches
provided by our dev team at Exceliance, allowing it to scale using
multiple processes. In your case, you should stick all interrupts to
code #0, haproxy to core #1 and stud to all remaining cores. That way
you should get optimal performance.

> I read a lot of people that have tried stud. This example is interesting in 
> this case because he assigns the
> different processes to different cores with cpuset: 
> http://vincent.bernat.im/en/blog/2011-ssl-benchmark.html
> 
> In my case, would cpuset be the same as taskset? 

Yes, that's the same. There is a number of tools for the same thing.
The principle is always the same : they reduce the list of processors
a task is allowed to run on.

Regards,
Willy

Title: Statistics Report for HAProxy

HAProxy version 1.5-dev7, released 2011/09/10

Statistics Report for pid 1962


> General process information

pid = 1962 (process #3, nbproc = 3)
uptime = 0d 0h04m13s
system limits: memmax = unlimited; ulimit-n = 200027
maxsock = 200027; maxconn = 100000; maxpipes = 0
current conns = 2019; current pipes = 0/0; conn rate = 93/sec
Running tasks: 2/2026; idle = 45 %

 active UP  backup UP
active UP, going down backup UP, going down
active DOWN, going up backup DOWN, going up
active or backup DOWN  not checked
active or backup DOWN for maintenance (MAINT)  
Note: UP with load-balancing disabled is reported as "NOLB".
Display option:External ressources:
main
QueueSession rateSessionsBytesDeniedErrorsWarningsServer
CurMaxLimitCurMaxLimitCurMaxLimitTotalLbTotInOutReqRespReqConnRespRetrRedisStatusLastChkWghtActBckChkDwnDwntmeThrtle
Frontend92131-2018202010000013050132839048679457079000OPEN

bh_frontends
QueueSession rateSessionsBytesDeniedErrorsWarningsServer
CurMaxLimitCurMaxLimitCurMaxLimitTotalLbTotInOutReqRespReqConnRespRetrRedisStatusLastChkWghtActBckChkDwnDwntmeThrtle
i-2895xxxx-east-1b00-823125510891830-12039711960044081104225226042007481004m13s UP L7OK/200 in 430ms1Y-000s-
i-a295xxxx-east-1c00-82311876081025-11960411960044267236226677239000404m13s UP L7OK/200 in 44ms1Y-000s-
i-c894xxxx-east-1d00-82211882773-11959911959944490708227553798000004m13s UP L7OK/200 in 39ms1Y-000s-
Backend002468356217121911150003587993587991328390486794570790007481404m13s UP 330 00s

stats
QueueSession rateSessionsBytesDeniedErrorsWarningsServer
CurMaxLimitCurMaxLimitCurMaxLimitTotalLbTotInOutReqRespReqConnRespRetrRedisStatusLastChkWghtActBckChkDwnDwntmeThrtle
Frontend12-111000002513029330036000OPEN
Backend0000001000000130293300360000004m13s UP 000 0 

Reply via email to