On Tue, Jul 27, 2010 at 09:42:16AM -0700, Alexander Duyck wrote:
> The fact that the ring size seems to effect the number of packets
> dropped per second implies that there may be some sort of latency issue.
> One thing you might try is different values for rx-usecs via ethtool
> -C. You may find that fixing the value at something fairly low like 33
> usecs per interrupt may help to reduce the number of rx_fifo_errors.
"ethtool -C eth0 rx-usecs 33" is accepted, but "ethtool -c eth0" shows
the values unchanged. This is with igb-2.2.9.
> After looking over your lspci dump I am assuming you are running a
> Supermicro motherboard with the AMD SR5690/SP5100 chipset. If that is
> the case you will probably find that one physical ID works much better
> than the other for network performance because the SR5690 that the 82576
> is connected to is going to be node local for one of the sockets and
> remote for the other.
>
> Another factor you will need to take into account is that the ring
> memory should be allocated on the same node the hardware is on. You
> should be able to accomplish that by using taskset with the correct CPU
> mask for the physical ID you are using when calling modprobe/insmod and
> the ifconfig commands to bring up the interfaces. This should help to
> decrease the memory latency and increase the throughput available to the
> adapter.
The taskset/modprobe trick along with putting all the queues on the same
physical CPU seems to provide the best performance, when using these
igb-2.2.9 settings:
taskset 0002 modprobe igb RSS=0,0 InterruptThrottleRate=3,3
r...@big-tester:~# eth_affinity_tool show eth0 eth1
16 CPUs detected
eth0: ffff 0001 0002 0004 0008 0010 0020 0040 0080
eth1: ffff 0001 0002 0004 0008 0010 0020 0040 0080
> Thanks for the information. One other item I would be interested in
> seeing is the kind of numbers we are talking about. If you could
> provide me with an ethtool -S dump from 10 seconds of one of your tests
> that might be useful for me to better understand the kind of pressures
> the system is under.
Here's a sample 10-second run of "ethtool -S" on the receiving interface,
after being piped through beforeafter, so these are 10 seconds of each
counter. Counters that are missing were zero, i.e., no changes in the
10 second run. So dividing these numbers by 10 gives you the per-second
rate.
In this interval, we're sending far more packets than
can be processed:
Interval from 20100727.174619 to 20100727.174629
NIC statistics:
rx_packets: 11595293
rx_bytes: 742098688
rx_long_byte_count: 742098688
rx_fifo_errors: 163800
rx_queue_0_packets: 754216
rx_queue_0_bytes: 45252960
rx_queue_0_drops: 20475
rx_queue_1_packets: 760734
rx_queue_1_bytes: 45644040
rx_queue_1_drops: 20475
rx_queue_2_packets: 736546
rx_queue_2_bytes: 44192760
rx_queue_2_drops: 20475
rx_queue_3_packets: 742368
rx_queue_3_bytes: 44542080
rx_queue_3_drops: 20475
rx_queue_4_packets: 661758
rx_queue_4_bytes: 39705480
rx_queue_4_drops: 20475
rx_queue_5_packets: 713095
rx_queue_5_bytes: 42785706
rx_queue_5_drops: 20475
rx_queue_6_packets: 696702
rx_queue_6_bytes: 41802120
rx_queue_6_drops: 20475
rx_queue_7_packets: 705726
rx_queue_7_bytes: 42343560
rx_queue_7_drops: 20475
And in this interval, we're sending packets just a bit faster than they
can be processed:
Interval from 20100727.175747 to 20100727.175757
NIC statistics:
rx_packets: 3877553
rx_bytes: 248163392
rx_long_byte_count: 248163392
rx_fifo_errors: 6515
rx_queue_0_packets: 484608
rx_queue_0_bytes: 29076480
rx_queue_0_drops: 81
rx_queue_1_packets: 484690
rx_queue_1_bytes: 29081400
rx_queue_2_packets: 484690
rx_queue_2_bytes: 29081400
rx_queue_3_packets: 484360
rx_queue_3_bytes: 29061600
rx_queue_3_drops: 342
rx_queue_4_packets: 483207
rx_queue_4_bytes: 28992420
rx_queue_4_drops: 1497
rx_queue_5_packets: 480183
rx_queue_5_bytes: 28810980
rx_queue_5_drops: 4521
rx_queue_6_packets: 484615
rx_queue_6_bytes: 29076900
rx_queue_6_drops: 74
rx_queue_7_packets: 484690
rx_queue_7_bytes: 29081400
Again, to keep things readable, my counter-processing script doesn't
list statistics that didn't increment during the 10 second interval,
so the stats from "ethtool -S" not listed were all zero.
Note that the traffic I'm using to get these numbers (lots of small UDP
packets) is a denial-of-service scenario - we're more interested
in real-world performance for routing but that's harder to simulate, and
we need to be able to handle DoS attacks so this is the benchmark we're
using.
Thanks,
-- Ed
------------------------------------------------------------------------------
The Palm PDK Hot Apps Program offers developers who use the
Plug-In Development Kit to bring their C/C++ apps to Palm for a share
of $1 Million in cash or HP Products. Visit us here for more details:
http://ad.doubleclick.net/clk;226879339;13503038;l?
http://clk.atdmt.com/CRS/go/247765532/direct/01/
_______________________________________________
E1000-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel® Ethernet, visit
http://communities.intel.com/community/wired