On Wed, 12 Sep 2012 22:47:55 +0200 Dick Snippe <[email protected]> wrote:
> On Wed, Sep 12, 2012 at 04:05:02PM +0000, Brandeburg, Jesse wrote: > > > Hi Dick, we need to know exactly what you are expecting to happen > > here. > > I'm surprised by the large increase in latency (from <1ms to >100ms). > In our production environment we see this phenomenon even on "moderate" > load, transmitting 1.5-2Gbit. I believe maybe you could be (I'd equivocate more if I could) seeing a bit of the "bufferbloat" effect maybe from the large queues available by default on the 10G interface. can you try running with smaller transmit descriptor rings? ethtool -G ethx tx 128 can you separately try running without other offloads? like: ethtool -K ethx lro off tso off gro off > This effect on 10G infrastructure appears to be much more pronounced > compared to 1G. When testing on 1G nics latency also increases, but much > less so; from <1ms to ~10ms. A difference is that the 1G nics are > saturated but the 10G ones are "only" transmitting ~1.5 Gbit. that is a very interesting data point. Are your 1G nics multi-queue? > > There is a simple test you can do, try to disable TSO using > > ethtool. ethtool -K ethx tso off > > I just tried that. The results are very similar. hm, you aren't getting any flow control in your network are you? (see ethtool -S ethx) and take a look at the other stats while you are there. it also might be interesting to sniff the ethx interface to see the outbound traffic patterns and delays between ping request/reponse. start your test start the ping tcpdump -i ethx -s 128 -w snippetx.cap -c 1000 bzip2 snippetx.cap <put on pastebin or some other web site and email us link> > > If that helps then we know that we need to pursue ways to get > > your high priority traffic onto its own queue, which btw is why the > > single thread iperf works. Ping goes to a different queue (by luck) > > and gets out sooner due to not being behind other traffic > > Interestingly multi threaded iperf (iperf -P 50) manages to do +/- > 7.5Gbit while ping latency is still around 0.1 - 0.3 ms. Thats only interesting if you're using all 16 queues, were you? There are some games here with the scheduler and NIC irq affinity as well that might be impacting us. Can you please make sure you killall irqbalance, and run set_irq_affinity.sh ethx ethy. The goal here is to start eliminating latency causes. I'd also be curious what your interrupts per second per queue are during your workload. Lastly, I'm headed out on vacation tonight and won't be available for a while. I hope that someone else on my team will continue to work with you to debug what is going on. Maybe someone here can reproduce the issue and we will make much more progress. Any testing details like kernel version, driver version, etc will be helpful. ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ E1000-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/e1000-devel To learn more about Intel® Ethernet, visit http://communities.intel.com/community/wired
