Hi, Willy

    I have done performance of haproxy again today.

    My tester has eigth Gigabit prots,  four gigabit ports aggregated
to emulate clients, and the other four gigabit ports aggregated to
emulate servers.  So the max throughput is expected to be 4Gbps.

   No matter splice is enabled or disabled(with the options -dS or
disabled with complied option), the throughput is 2.8Gbps more or less
under such conditions as follow:
1) HTTP object size is 1MB
2) max concurrent session is 10,000
3) one HTTP transcation on each connection.
 The throughput was not promoted by enabling splice.

    The following settings are executed according to your suggestions:
1. kernel version: 3.5.0
2. haproxy version: 1.5-dev12
3. haproxy config added: tune.pipesize: 524288
4. sysctl:
  net.ipv4.tcp_rmem = 4096        262144  16745216
  net.ipv4.tcp_wmem = 4096        262144  16745216
5. haproxy running on core 0, and network interrupts are sent to core 1.
6. LRO is enabled
Offload parameters for eth1(eth3):
rx-checksumming: on
tx-checksumming: on
scatter-gather: on
tcp-segmentation-offload: on
udp-fragmentation-offload: off
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: on
rx-vlan-offload: off
tx-vlan-offload: off
ntuple-filters: off
receive-hashing: on

The following are CPU usage and interrupts on different cores:

top - 17:12:30 up 23:10,  3 users,  load average: 0.62, 0.60, 0.53
Tasks:  99 total,   3 running,  96 sleeping,   0 stopped,   0 zombie
Cpu0  :  4.8%us, 30.3%sy,  0.0%ni, 63.8%id,  0.0%wa,  0.3%hi,  0.7%si,  0.0%st
Cpu1  :  0.0%us,  0.0%sy,  0.0%ni,  7.0%id,  0.0%wa,  2.3%hi, 90.6%si,  0.0%st
Cpu2  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu3  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st

           CPU0       CPU1       CPU2       CPU3
 50:         25   19986125         17          0   PCI-MSI-edge      eth1-TxRx-0
 51:          0   20002074         13          0   PCI-MSI-edge      eth1-TxRx-1
 52:          0   20004145         16          0   PCI-MSI-edge      eth1-TxRx-2
 53:          2   20004083         13          0   PCI-MSI-edge      eth1-TxRx-3
 54:          0          0          1          0   PCI-MSI-edge      eth1
 55:         14   16075935          7          0   PCI-MSI-edge      eth3-TxRx-0
 56:          3   16070740          3          0   PCI-MSI-edge      eth3-TxRx-1
 57:          5   16091911          3          0   PCI-MSI-edge      eth3-TxRx-2
 58:          4   16077275          3          0   PCI-MSI-edge      eth3-TxRx-3
 59:          2          0          0          0   PCI-MSI-edge      eth3

>From these results, the network interrupts were sent to CPU1 and
haproxy was running on CPU0 indeed.

I am wondering that what else I can do to eliminate this confused
result. If I send all network interrupts to one core and make haproxy
run on another core, there maybe cpu cache missing, so I was also
confused by this setting.

BTW, you can ignore the result that the throughput is only 2Gbps while
the object size is 1M in my first letter, because I only enabled two
gigabit ports after checking my settings.

Thank you!

Godbach

Reply via email to