Hi, Willy
I have done performance of haproxy again today.
My tester has eigth Gigabit prots, four gigabit ports aggregated
to emulate clients, and the other four gigabit ports aggregated to
emulate servers. So the max throughput is expected to be 4Gbps.
No matter splice is enabled or disabled(with the options -dS or
disabled with complied option), the throughput is 2.8Gbps more or less
under such conditions as follow:
1) HTTP object size is 1MB
2) max concurrent session is 10,000
3) one HTTP transcation on each connection.
The throughput was not promoted by enabling splice.
The following settings are executed according to your suggestions:
1. kernel version: 3.5.0
2. haproxy version: 1.5-dev12
3. haproxy config added: tune.pipesize: 524288
4. sysctl:
net.ipv4.tcp_rmem = 4096 262144 16745216
net.ipv4.tcp_wmem = 4096 262144 16745216
5. haproxy running on core 0, and network interrupts are sent to core 1.
6. LRO is enabled
Offload parameters for eth1(eth3):
rx-checksumming: on
tx-checksumming: on
scatter-gather: on
tcp-segmentation-offload: on
udp-fragmentation-offload: off
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: on
rx-vlan-offload: off
tx-vlan-offload: off
ntuple-filters: off
receive-hashing: on
The following are CPU usage and interrupts on different cores:
top - 17:12:30 up 23:10, 3 users, load average: 0.62, 0.60, 0.53
Tasks: 99 total, 3 running, 96 sleeping, 0 stopped, 0 zombie
Cpu0 : 4.8%us, 30.3%sy, 0.0%ni, 63.8%id, 0.0%wa, 0.3%hi, 0.7%si, 0.0%st
Cpu1 : 0.0%us, 0.0%sy, 0.0%ni, 7.0%id, 0.0%wa, 2.3%hi, 90.6%si, 0.0%st
Cpu2 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu3 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
CPU0 CPU1 CPU2 CPU3
50: 25 19986125 17 0 PCI-MSI-edge eth1-TxRx-0
51: 0 20002074 13 0 PCI-MSI-edge eth1-TxRx-1
52: 0 20004145 16 0 PCI-MSI-edge eth1-TxRx-2
53: 2 20004083 13 0 PCI-MSI-edge eth1-TxRx-3
54: 0 0 1 0 PCI-MSI-edge eth1
55: 14 16075935 7 0 PCI-MSI-edge eth3-TxRx-0
56: 3 16070740 3 0 PCI-MSI-edge eth3-TxRx-1
57: 5 16091911 3 0 PCI-MSI-edge eth3-TxRx-2
58: 4 16077275 3 0 PCI-MSI-edge eth3-TxRx-3
59: 2 0 0 0 PCI-MSI-edge eth3
>From these results, the network interrupts were sent to CPU1 and
haproxy was running on CPU0 indeed.
I am wondering that what else I can do to eliminate this confused
result. If I send all network interrupts to one core and make haproxy
run on another core, there maybe cpu cache missing, so I was also
confused by this setting.
BTW, you can ignore the result that the throughput is only 2Gbps while
the object size is 1M in my first letter, because I only enabled two
gigabit ports after checking my settings.
Thank you!
Godbach