On Thu, 2012-10-04 at 15:40 +0200, Dick Snippe wrote:
> On Tue, Sep 18, 2012 at 12:55:02PM +0200, Dick Snippe wrote:
> 
> FYI:
> 
> > For our production platform I will try some experiments with decreased
> > txqueuelen, binding (web)server instances to specific cores ad boot
> > a server with kernel 3.5 + fq_codel to see what works best in practice.
> 
> After quite a bit of testing + some real-world experience on our
> production platform it turns out that tweaking Receive Packet Steering
> (rps) in combination with fq_codel can have a huge impact.
> 
> My theory (based on the results below) is that when a server is sending
> out large volumes of data, the return traffic (ACK's) can become so
> large (150.000 packets/second) that the drivers switches to polling.
> When rps is not active this polling apparently causes a drop in throughput
> and all tx-queues fill up resulting in much larger latency.
> 
> Below are our results.
> 
> Our test setup consisted of 4 servers (IBM HS22 blades, 96Gbyte RAM, 2x
> quad core westmere E5620, dual 82599EB 10-Gigabit NICs, running vanilla
> kernel.org 3.5.4 kernel, stock ixgbe 3.9.15-k driver)
> 
> host1: runs dltest webserver, serving a 100Mbyte test file
> host2+3: acting as clients using ab test program
>       ab -n 100000 -c 500 http://dltest.omroep.nl/100m
> host4: "observation server". for measuring ping latency to server1
>       sudo ping -c 1000 -i 0.001 -q dltest.omroep.nl
> 
> All 4 servers are directly connected through a Cisco Nexus 4001I
> 10GB Switch.  The test servers are in the same blade enclosure and
> the test traffic never leaves the blade enclosure.
> 
> With default settings and a small number of flows (ab -c 10)
> we can obtain line speed easily and ping latency is low (<1ms)
> The same goes for iperf.
> 
> However, with a larger number of flows (both clients doing ab -c 500,
> i.e. a total of 1000 concurrent flows) throughput on the webserver
> drops dramatically to 1-2Gbit and ping latency rises to ~100ms.

I wonder if receivers are using GRO ?

If yes, the numer of ACK they are sending back should be limited to one
ACK per GRO packet, instead of one ACK every 2 MSS.

Also, I was considering adding GRO support of TCP pure ACK, at least for
local traffic (not forwarding workloads)

It would be nice if you could post a "perf top" output of the sender,
because dropping to 1-2Gbit sounds really really bad...




------------------------------------------------------------------------------
Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
what is happening inside your Ruby, Python, PHP, Java, and .NET app
Try New Relic at no cost today and get our sweet Data Nerd shirt too!
http://p.sf.net/sfu/newrelic-dev2dev
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit 
http://communities.intel.com/community/wired

Reply via email to