Re: [E1000-devel] 100+ ms latency when 82599EB put under moderate load

Jesse Brandeburg Wed, 12 Sep 2012 17:46:18 -0700

On Wed, 12 Sep 2012 22:47:55 +0200
Dick Snippe <[email protected]> wrote:

> On Wed, Sep 12, 2012 at 04:05:02PM +0000, Brandeburg, Jesse wrote:
> 
> > Hi Dick, we need to know exactly what you are expecting to happen
> > here.
> 
> I'm surprised by the large increase in latency (from <1ms to >100ms).
> In our production environment we see this phenomenon even on "moderate"
> load, transmitting 1.5-2Gbit.

I believe maybe you could be (I'd equivocate more if I could) seeing a
bit of the "bufferbloat" effect maybe from the large queues available by
default on the 10G interface.

can you try running with smaller transmit descriptor rings?
ethtool -G ethx tx 128

can you separately try running without other offloads?  like:
ethtool -K ethx lro off tso off gro off

> This effect on 10G infrastructure appears to be much more pronounced 
> compared to 1G. When testing on 1G nics latency also increases, but much
> less so; from <1ms to ~10ms. A difference is that the 1G nics are
> saturated but the 10G ones are "only" transmitting ~1.5 Gbit.

that is a very interesting data point.  Are your 1G nics multi-queue?

> > There is a simple test you can do, try to disable TSO using
> > ethtool. ethtool -K ethx tso off
> 
> I just tried that. The results are very similar.

hm, you aren't getting any flow control in your network are you?  (see
ethtool -S ethx) and take a look at the other stats while you are
there. it also might be interesting to sniff the ethx interface to see
the outbound traffic patterns and delays between ping request/reponse.

start your test
start the ping
tcpdump -i ethx -s 128 -w snippetx.cap -c 1000
bzip2 snippetx.cap
<put on pastebin or some other web site and email us link>

> > If that helps then we know that we need to pursue ways to get
> > your high priority traffic onto its own queue, which btw is why the
> > single thread iperf works. Ping goes to a different queue (by luck)
> > and gets out sooner due to not being behind other traffic
> 
> Interestingly multi threaded iperf (iperf -P 50) manages to do +/-
> 7.5Gbit while ping latency is still around 0.1 - 0.3 ms.

Thats only interesting if you're using all 16 queues, were you?

There are some games here with the scheduler and NIC irq affinity as
well that might be impacting us.  Can you please make sure you killall
irqbalance, and run set_irq_affinity.sh ethx ethy.  The goal here is to
start eliminating latency causes.  I'd also be curious what your
interrupts per second per queue are during your workload.

Lastly, I'm headed out on vacation tonight and won't be available for a
while.  I hope that someone else on my team will continue to work with
you to debug what is going on.

Maybe someone here can reproduce the issue and we will make much more
progress.  Any testing details like kernel version, driver version, etc
will be helpful.

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
E1000-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit 
http://communities.intel.com/community/wired

Re: [E1000-devel] 100+ ms latency when 82599EB put under moderate load

Reply via email to