Hi Godbach,

On Tue, Sep 25, 2012 at 06:30:25PM +0800, Godbach wrote:
> Hi, Willy
> 
>     I have done performance of haproxy again today.
> 
>     My tester has eigth Gigabit prots,  four gigabit ports aggregated
> to emulate clients, and the other four gigabit ports aggregated to
> emulate servers.  So the max throughput is expected to be 4Gbps.
> 
>    No matter splice is enabled or disabled(with the options -dS or
> disabled with complied option), the throughput is 2.8Gbps more or less
> under such conditions as follow:
> 1) HTTP object size is 1MB
> 2) max concurrent session is 10,000
> 3) one HTTP transcation on each connection.
>  The throughput was not promoted by enabling splice.
> 
>     The following settings are executed according to your suggestions:
> 1. kernel version: 3.5.0
> 2. haproxy version: 1.5-dev12
> 3. haproxy config added: tune.pipesize: 524288
> 4. sysctl:
>   net.ipv4.tcp_rmem = 4096        262144  16745216
>   net.ipv4.tcp_wmem = 4096        262144  16745216
> 5. haproxy running on core 0, and network interrupts are sent to core 1.
> 6. LRO is enabled
> Offload parameters for eth1(eth3):
> rx-checksumming: on
> tx-checksumming: on
> scatter-gather: on
> tcp-segmentation-offload: on
> udp-fragmentation-offload: off
> generic-segmentation-offload: on
> generic-receive-offload: on
> large-receive-offload: on
> rx-vlan-offload: off
> tx-vlan-offload: off
> ntuple-filters: off
> receive-hashing: on
> 
> The following are CPU usage and interrupts on different cores:
> 
> top - 17:12:30 up 23:10,  3 users,  load average: 0.62, 0.60, 0.53
> Tasks:  99 total,   3 running,  96 sleeping,   0 stopped,   0 zombie
> Cpu0  :  4.8%us, 30.3%sy,  0.0%ni, 63.8%id,  0.0%wa,  0.3%hi,  0.7%si,  0.0%st
> Cpu1  :  0.0%us,  0.0%sy,  0.0%ni,  7.0%id,  0.0%wa,  2.3%hi, 90.6%si,  0.0%st
> Cpu2  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
> Cpu3  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st

As you can see, most of the time is spent in softirq (network stack+driver).
Do you know how many packets were processed per second ? And the interrupt rate
needs to be checked too.

You can get some advanced stats using ethtool -S on each device (just one on
the input path will be enough already).

If you can't make the interrupt processing consume less CPU, you can try to
spread your interrupts over multiple cores, provided that you're not using
the same core as haproxy.

You need to saturate either the network or enough CPUs, but right now given
that haproxy runs roughly at 35%, there is some room for improvement.

Regards,
Willy


Reply via email to