On Fri, Sep 14, 2012 at 10:40:53AM -0700, Alexander Duyck wrote: > On 09/14/2012 05:22 AM, Dick Snippe wrote: > > On Wed, Sep 12, 2012 at 05:10:44PM -0700, Jesse Brandeburg wrote: > > > >> On Wed, 12 Sep 2012 22:47:55 +0200 > >> Dick Snippe <dick.sni...@tech.omroep.nl> wrote: > >> > >>> On Wed, Sep 12, 2012 at 04:05:02PM +0000, Brandeburg, Jesse wrote: > >>> > >>>> Hi Dick, we need to know exactly what you are expecting to happen > >>>> here. > >>> I'm surprised by the large increase in latency (from <1ms to >100ms). > >>> In our production environment we see this phenomenon even on "moderate" > >>> load, transmitting 1.5-2Gbit. > >> I believe maybe you could be (I'd equivocate more if I could) seeing a > >> bit of the "bufferbloat" effect maybe from the large queues available by > >> default on the 10G interface. > >> > >> can you try running with smaller transmit descriptor rings? > >> ethtool -G ethx tx 128 > > Not much difference: > > |1000 packets transmitted, 1000 received, 0% packet loss, time 7386ms > > |rtt min/avg/max/mdev = 48.522/76.642/93.488/6.404 ms, pipe 14 > > |Transfer rate: 168162.02 [Kbytes/sec] received > > > > However, if I retry with "ifconfig ethx txqueuelen 10" latency > > (not throughput) looks better: > > |1000 packets transmitted, 987 received, 1% packet loss, time 5905ms > > |rtt min/avg/max/mdev = 0.443/17.018/42.106/8.075 ms, pipe 7 > > |Transfer rate: 132776.78 [Kbytes/sec] received > > That is to be expected. The txqueuelen would have a much larger impact > then the Tx ring size since the qdisc can hold significantly more > packets. You may want to look into enabling Byte Queue Limits (BQL) for > control over the amount of data that is allowed to sit pending on the > ring. That in conjunction with a small txqueuelen should help to reduce > overall latency.
I was just looking into bql; if I understand correctly activating BQL is writing a max value to /sys/class/net/ethx/queuestx-*/byte_queue_limits/limit_max Am I right? I can get some good results by tweaking both ifconfig txqueuelen and byte_queue_limits/limit_max to rather extreme (small) values. With txqueuelen 10 and limit_max=1024 I get 0.2msec ping latency and almost 9Gbit network throughput. However, I have no idea what is going to happen when these settings are applied to real-world conditions where we want high throughput for internet facing traffic and low latency for internal traffic (notably memcache and NFS) However, this looks promising, because bandwith is almost a factor 10 better and latency almost a factor 1000 better! On this particular hardware we've got 2x 10G + 2x 1G nics. Currently in our production environment the 1G nics aren't being used and all traffic (both volume webserving traffic to the internet and internal NFS and memcached traffic) is done over the 10G nics. (active/passive bond with 802.1q vlans) I could separate the flows; do high volume internet traffic over a 10G bond and low latency internal traffic over a 1G bond. That would probably work for now, but costs us an extra pair of 1G switches and NFS traffic would be limited to 1G. Maybe I should look in to Transmit Packet Steering (XPS) to do the separation in software; 15 queues for volume output, 1 queue for low latency traffic; however I haven't yet found out how to direct traffic to the right queue. > Are you running a mixed 10G/1G network? If so do you know what kind of > buffering may be going on in your switch between the two fabrics? Are > your tests being run between two 10G systems or are you crossing over > between 10G and 1G to conduct some of your test? The reason why I ask > is because I have seen similar issues in the past when 1Gbs and 100Mbs > were combined resulting in the latency significantly increasing any time > one of the 100Mbs links were saturated. All my 10G testing so far has been on strictly 10G networks. Basically between 2 servers in a blade enclosure with only 1 10G switch in between them. I.e. the traffic doesn;t even leave the blade enclosure. > One interesting data point might be to test the latency with the 1G > ports and 10G ports isolated from each other to see if this may be an > issue of buffer bloat between the two traffic rates introducing a delay. There is no mixing between 10G and 1G taking place. The 1G tests were done on 1G nics connected by an 1G switch. In both cases the setup was NIC1<->switch<->NIC2 > >>>> If that helps then we know that we need to pursue ways to get > >>>> your high priority traffic onto its own queue, which btw is why the > >>>> single thread iperf works. Ping goes to a different queue (by luck) > >>>> and gets out sooner due to not being behind other traffic > >>> Interestingly multi threaded iperf (iperf -P 50) manages to do +/- > >>> 7.5Gbit while ping latency is still around 0.1 - 0.3 ms. > >> Thats only interesting if you're using all 16 queues, were you? > > I'm not sure. How can I check how many queue's I'm using? > > You can verify how many queues you are using by viewing ethtool -S > results for the interface while passing traffic. Any of the Tx queues > that have incrementing packet counts are in use. Yes, thanks. It turns out with iperf -P 50 nor all queue's are being used. So it makes sense that ping latency is lower then. > >> Lastly, I'm headed out on vacation tonight and won't be available for a > >> while. I hope that someone else on my team will continue to work with > >> you to debug what is going on. > > Hava a nice vacation! > > If someone els could help me with this issue, that would be great. > > As you can probably tell from the fact that I am replying I will step in > while Jesse is out to help you resolve this. This is much appreciated! > >> Maybe someone here can reproduce the issue and we will make much more > >> progress. Any testing details like kernel version, driver version, etc > >> will be helpful. > > $ uname -r > > 3.5.3-2POi-x86_64 (we compile our own kernels, this is a vanilla > > kernel.org kernel; /proc/config.gz attached) > > $ sudo ethtool -i eth1 > > driver: ixgbe > > version: 3.9.15-k > > firmware-version: 0x613e0001 > > bus-info: 0000:15:00.1 > > So the driver is just the stock ixgbe driver included with the 3.5.3 > kernel then? correct. > If so, that makes it a bit easier to debug since we know > exactly what code we are working with if this does turn out to be a > driver issue. If needed I can test other kernel/driver/whatever versions if that makes debugging easier for you. If I understand correctly so far the driver is operating as intended, It's just that my assumptions (low latency + high throughput aka "can have your cake and eat it too") are overly optimistic (?) -- Dick Snippe, internetbeheerder \ fight war beh...@omroep.nl, +31 35 677 3555 \ not wars NPO ICT, Sumatralaan 45, 1217 GP Hilversum, NPO Gebouw A ------------------------------------------------------------------------------ Got visibility? Most devs has no idea what their production app looks like. Find out how fast your code is with AppDynamics Lite. http://ad.doubleclick.net/clk;262219671;13503038;y? http://info.appdynamics.com/FreeJavaPerformanceDownload.html _______________________________________________ E1000-devel mailing list E1000-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/e1000-devel To learn more about Intel® Ethernet, visit http://communities.intel.com/community/wired