1 - With recent CPUs Intel 5300/5400/5500/5600 and AMD 6100 the set of optimal
compiler settings for optimizations :) is not something anyone can keep up
with - not to mention different versions of gcc that understand none, some or
all of the features of these CPUs. march native allows gcc to take on the
burden of optimizing the compile time settings, so if that could be added as
one of the options in the makefile, it would be helpful because then I could
use the same "make..." line on every machine but it would self-adjust for that
machine. Obviously, this is not a setting that distros would use to spin
package binaries, but for great for getting the optimal settings for a given
machine. Examples:
model name : Intel(R) Xeon(R) CPU E5520 @ 2.27GHz
# cc -march=native -E -v - < /dev/null 2>&1 | fgrep cc1
/usr/libexec/gcc/x86_64-redhat-linux/4.4.5/cc1 -E -quiet -v - -march=core2
-mcx16 -msahf -mpopcnt -msse4.2 --param l1-cache-size=32 --param
l1-cache-line-size=64 --param l2-cache-size=8192 -mtune=core2
model name : AMD Opteron(tm) Processor 6172
[r...@hesj3-m41 cron.d]# cc -march=native -E -v - < /dev/null 2>&1 | fgrep cc1
/usr/libexec/gcc/x86_64-redhat-linux/4.5.1/cc1 -E -quiet -v - -march=amdfam10
-mcx16 -msahf -mpopcnt -mabm --param l1-cache-size=64 --param
l1-cache-line-size=64 --param l2-cache-size=512 -mtune=amdfam10
2 - Google has pushed via both tcp related RFCs and patches to the networking
code for the linux kernel to allow the initial cwnd to be set as a socket
option - this would be a huge help to sites that communicate with the same
clients over and over and/or with many small requests allowing a full response
in one (or at least fewer) round trips. For one site that I work on that is
over 250 ms away with a very reliable gateway on the other end, I burn through
several round trips to deliver an icon/small gif/etc - an icon that could have
all the necessary packets in flight before the first ack. It turns out the
small initial cwnd creates more traffic across the under sea cables than an
initial cwnd of 8 or 10 or 12.
http://www.amailbox.org/mailarchive/linux-netdev/2010/5/26/6278007
I also wanted to see if you were aware of two other recent kernel changes that
could be helpful to haproxy performance, the first could be helpful for the
new UNIX socket connections in recent haproxy versions:
Implementation of recvmmsg:
recvmmsg() is a new syscall that allows to receive with a single syscall
multiple messages that would require multiple calls to recvmsg(). For
high-bandwith, small packet applications, throughput and latency are improved
greatly.
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=a2e2725541fad72416326798c2d7fa4dafb7d337
The second is "RPS" from google to improve network processing performance with
multiple CPUs - similar to MSI-X but google found that both together had even
more performance than just MSI-X:
http://kernelnewbies.org/Linux_2_6_35#head-94daf753b96280181e79a71ca4bb7f7a423e302a
http://lwn.net/Articles/362339/