Hi,
On Wed, Apr 18, 2012 at 01:26:21PM +0530, Saifuddin Kaijar wrote:
> Hi Willy,
>
> Thanks once again for your previous suggestion about running haproxy in
> multiple core. Now I am able to run it. And also
> performance is got increase.
>
> I have some more doubt that I want to share with you. For my test with
> HAProxy I am using *httperf *in client side and *lighttpd*
> for server side. My setup is as follows :
>
>
> Client 1( Httperf ) Server 1( lighttpd )
> \ /
> \ /
> \ |``````````````` | /
> / | HAProxy |/
> / |_________|\
> / \
> / \
> Client 2( Httperf ) Server 2( lighttpd )
>
>
> With this above setup I am getting maximum 20k performance.
>
> But in your site you mentioned that you use *inject* for client side and *
> httpterm* for server side. After that I also tried with that for
> my test. The commands are as follows :
>
> Client Side : ./inject -T 1000 -G <server>:<port>/ -o 900 -u 100
This setting simply does not work. 900 objects * 100 users = 90000
concurrent connections, and by default you're limited to 1000 fd. This
is not mentionning that inject does not scale with parallel connections.
If you want to achieve high connection rates, you need to remove -T, to
keep -o low (eg: 4) and have -u between 10 and 250. With fast NICs, lower
-u will be enough to fill the pipe and will reduce the CPU usage on inject.
With slower NICs, you'll have to increase -u to cover the NIC latency, at
the expense of a waste of CPU on inject.
> Server side : httpterm -L <haproxy-server>:<port>
>
> For this I am getting maximum 27k performance. Please find the attachment
> for *haproxy.cfg* file for further details.
>
> For all my test I am using 1G port.
>
> Now my doubt is that how can I get more performance as you mentioned 40k in
> your site?
At these rates, it depends a lot on three things :
- the object size
- the NIC latency
- interrupt distribution
The first point translates into bandwidth and packet rate. You need to know
that it is possible to saturate a gig link with a very small bandwidth, as
the minimum frame size on gigE is 512 *bytes*. However, NICs are able to
aggregate multiple L2 frames into a single 512-byte frame, so as long as
the inter-frame latency is low enough, you can get a high packet rate. At
some packet rates, you simply send a 512-byte frame for each packet,
resulting in a very poor bandwidth usage. At 27k connections per second,
assuming these are small objects (less than 1.5kB) you're normally at
8*27=216 kpps on each side. If you're using a single NIC for all your
tests, you're sending 216 kpps and receiving 216 kpps. To simplify, let's
say that each packet is 64 bytes except the request and the response, 300
bytes each. You have 6*64+2*300 ~= 1kB per request and per side. At 27krps,
this is already 1/4 gigabit in each direction if both sides are on the same
NIC. Now in the worst case, if all of your packets were padded to 512-bytes,
you'd have 8*512*27k = ~1 Gbps in each direction. You see that does not seem
that stupid in the end.
When I reach the 40krps, it's with 2 distinct gig ports and with haproxy
saving some packets :
- option tcp-smart-accept saves one packet with the client
- option tcp-smart-connect saves another packet to the server
Your connection drops to 6 packets instead of 8. If you observe a performance
increase, you're likely either NIC-bound or CPU-bound in the network layer due
to some misconfiguration (iptables, ...).
With inject, you can even save one more packet using -F (same as smart-connect).
Let's observe if that changes anything.
In haproxy, you should use "option http-server-close" which will maintain
keep-alive with capable clients (httperf does, inject does not) and more
importantly will actively close the connection to the server, resulting in
5 packets per session exchanged with the server.
You should also remove the stats options from the listener you're using,
and move them to a dedicated listener. Nobody running at high rates wastes
time checking an URL in all requests that would be more efficiently served
on another ip:port.
Also, you need to check using "top" that haproxy reaches a steady 100% CPU.
Until you see this, you're wasting time somewhere else (network stack, IRQs,
etc...).
"option httplog" and "log global" are very expensive. Togerther can increase
CPU usage by around 20%, you should remove them if you want maximum throughput.
If I were you I would start with a single process pinned to a specific core
(taskset -m) and ensure that interrupts are delivered to another core close
to this one (echo XXX > /proc/irq/$irq/smp_affinity). And of course, check
that irqbalance is not running.
With such techniques I once managed to reach 68k/s on a core i5/3.3GHz
using two distinct myricom 10G NICs, and with inject -F and httpterm,
and 87k with "ab -k". This obviously is irrelevant to many setups but
what's important is to understand what setting impacts performance so
that people can choose depending on their usage.
Regards,
Willy