1 - With recent CPUs Intel 5300/5400/5500/5600 and AMD 6100 the set of optimal compiler settings for optimizations :) is not something anyone can keep up with - not to mention different versions of gcc that understand none, some or all of the features of these CPUs. march native allows gcc to take on the burden of optimizing the compile time settings, so if that could be added as one of the options in the makefile, it would be helpful because then I could use the same "make..." line on every machine but it would self-adjust for that machine. Obviously, this is not a setting that distros would use to spin package binaries, but for great for getting the optimal settings for a given machine. Examples:

model name      : Intel(R) Xeon(R) CPU           E5520  @ 2.27GHz

# cc -march=native -E -v - < /dev/null 2>&1 | fgrep cc1

/usr/libexec/gcc/x86_64-redhat-linux/4.4.5/cc1 -E -quiet -v - -march=core2 -mcx16 -msahf -mpopcnt -msse4.2 --param l1-cache-size=32 --param l1-cache-line-size=64 --param l2-cache-size=8192 -mtune=core2

model name      : AMD Opteron(tm) Processor 6172

[r...@hesj3-m41 cron.d]# cc -march=native -E -v - < /dev/null 2>&1 | fgrep cc1

/usr/libexec/gcc/x86_64-redhat-linux/4.5.1/cc1 -E -quiet -v - -march=amdfam10 -mcx16 -msahf -mpopcnt -mabm --param l1-cache-size=64 --param l1-cache-line-size=64 --param l2-cache-size=512 -mtune=amdfam10


2 - Google has pushed via both tcp related RFCs and patches to the networking code for the linux kernel to allow the initial cwnd to be set as a socket option - this would be a huge help to sites that communicate with the same clients over and over and/or with many small requests allowing a full response in one (or at least fewer) round trips. For one site that I work on that is over 250 ms away with a very reliable gateway on the other end, I burn through several round trips to deliver an icon/small gif/etc - an icon that could have all the necessary packets in flight before the first ack. It turns out the small initial cwnd creates more traffic across the under sea cables than an initial cwnd of 8 or 10 or 12.

http://www.amailbox.org/mailarchive/linux-netdev/2010/5/26/6278007

I also wanted to see if you were aware of two other recent kernel changes that could be helpful to haproxy performance, the first could be helpful for the new UNIX socket connections in recent haproxy versions:

Implementation of recvmmsg:
recvmmsg() is a new syscall that allows to receive with a single syscall multiple messages that would require multiple calls to recvmsg(). For high-bandwith, small packet applications, throughput and latency are improved greatly.

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=a2e2725541fad72416326798c2d7fa4dafb7d337

The second is "RPS" from google to improve network processing performance with multiple CPUs - similar to MSI-X but google found that both together had even more performance than just MSI-X:

http://kernelnewbies.org/Linux_2_6_35#head-94daf753b96280181e79a71ca4bb7f7a423e302a

http://lwn.net/Articles/362339/



Reply via email to