Re: questions on NAPI processing latency and dropped network packets

2008-01-21 Thread Chris Friesen
I've done some further digging, and it appears that one of the problems we may be facing is very high instantaneous traffic rates. Instrumentation showed up to 222K packets/sec for short periods (at least 1.1 ms, possibly longer), although the long-term average is down around 14-16K

Re: questions on NAPI processing latency and dropped network packets

2008-01-21 Thread Ben Greear
Chris Friesen wrote: Is there anything else we can do to minimize the latency of network packet processing and avoid having to crank the rx ring size up so high? Why is it such a big deal to crank up the rx queue length? Seems like a perfectly normal way to handle bursts like this...

Re: questions on NAPI processing latency and dropped network packets

2008-01-21 Thread Eric Dumazet
Chris Friesen a écrit : I've done some further digging, and it appears that one of the problems we may be facing is very high instantaneous traffic rates. Instrumentation showed up to 222K packets/sec for short periods (at least 1.1 ms, possibly longer), although the long-term average is down

Re: questions on NAPI processing latency and dropped network packets

2008-01-21 Thread Chris Friesen
Ben Greear wrote: Chris Friesen wrote: Is there anything else we can do to minimize the latency of network packet processing and avoid having to crank the rx ring size up so high? Why is it such a big deal to crank up the rx queue length? Seems like a perfectly normal way to handle bursts

Re: questions on NAPI processing latency and dropped network packets

2008-01-21 Thread Chris Friesen
Eric Dumazet wrote: Chris Friesen a écrit : I've done some further digging, and it appears that one of the problems we may be facing is very high instantaneous traffic rates. Instrumentation showed up to 222K packets/sec for short periods (at least 1.1 ms, possibly longer), although the

Re: questions on NAPI processing latency and dropped network packets

2008-01-21 Thread Ben Greear
Chris Friesen wrote: Ben Greear wrote: Chris Friesen wrote: Is there anything else we can do to minimize the latency of network packet processing and avoid having to crank the rx ring size up so high? Why is it such a big deal to crank up the rx queue length? Seems like a perfectly normal

Re: questions on NAPI processing latency and dropped network packets

2008-01-21 Thread Eric Dumazet
Chris Friesen a écrit : Eric Dumazet wrote: Chris Friesen a écrit : I've done some further digging, and it appears that one of the problems we may be facing is very high instantaneous traffic rates. Instrumentation showed up to 222K packets/sec for short periods (at least 1.1 ms, possibly

Re: questions on NAPI processing latency and dropped network packets

2008-01-16 Thread Willy Tarreau
On Wed, Jan 16, 2008 at 07:58:36AM +0100, Jarek Poplawski wrote: On Wed, Jan 16, 2008 at 11:17:08AM +1100, Herbert Xu wrote: ... Well people are always going to operate on this model for commercial reasons. FWIW I used to work for a company that stuck to a specific version of the Linux

Re: questions on NAPI processing latency and dropped network packets

2008-01-16 Thread Jarek Poplawski
On Wed, Jan 16, 2008 at 09:04:58PM +0100, Willy Tarreau wrote: ... you can work with latest release provided that you always have a fallback to an earlier one. That way, you don't bet too much on something you don't completely control. If it works, it tells you you'll be able to completely

Re: questions on NAPI processing latency and dropped network packets

2008-01-15 Thread Chris Friesen
Jarek Poplawski wrote: IMHO, checking this with a current stable, which probably you are going to do some day, anyway, should be 100% acceptable: giving some input to netdev, while still working for yourself. While I would love to do this, it's not that simple. Some of our hardware is not

Re: questions on NAPI processing latency and dropped network packets

2008-01-15 Thread Vlad Yasevich
Chris Friesen wrote: Eric Dumazet wrote: Chris Friesen a écrit : Based on the profiling information we're spending time in sctp_endpoint_lookup_assoc() which doesn't actually use hashes, so I can't see how the hash would be related. I'm pretty new to SCTP though, so I may be missing

Re: questions on NAPI processing latency and dropped network packets

2008-01-15 Thread AstralStorm
On Tue, 15 Jan 2008 08:47:07 -0600 Chris Friesen [EMAIL PROTECTED] wrote: Jarek Poplawski wrote: IMHO, checking this with a current stable, which probably you are going to do some day, anyway, should be 100% acceptable: giving some input to netdev, while still working for yourself.

Re: questions on NAPI processing latency and dropped network packets

2008-01-15 Thread Chris Friesen
Radoslaw Szkodzinski (AstralStorm) wrote: On Tue, 15 Jan 2008 08:47:07 -0600 Chris Friesen [EMAIL PROTECTED] wrote: Some of our hardware is not supported on mainline, so we need per-kernel version patches to even bring up the blade. The blades netboot via a jumbo-frame network, so kernel

Re: questions on NAPI processing latency and dropped network packets

2008-01-15 Thread Eric Dumazet
On Tue, 15 Jan 2008 11:14:25 -0600 Chris Friesen [EMAIL PROTECTED] wrote: Radoslaw Szkodzinski (AstralStorm) wrote: On Tue, 15 Jan 2008 08:47:07 -0600 Chris Friesen [EMAIL PROTECTED] wrote: Some of our hardware is not supported on mainline, so we need per-kernel version patches to even

Re: questions on NAPI processing latency and dropped network packets

2008-01-15 Thread Jarek Poplawski
On Tue, Jan 15, 2008 at 08:47:07AM -0600, Chris Friesen wrote: Jarek Poplawski wrote: IMHO, checking this with a current stable, which probably you are going to do some day, anyway, should be 100% acceptable: giving some input to netdev, while still working for yourself. While I would love

Re: questions on NAPI processing latency and dropped network packets

2008-01-15 Thread Herbert Xu
Jarek Poplawski [EMAIL PROTECTED] wrote: So, it was more a rhetorical trick (sorry!) to suggest, that such a business model of being always late with kernels might be quite practical and reasonable for many companies, but looks like the worst possible development model for Linux. Well people

Re: questions on NAPI processing latency and dropped network packets

2008-01-15 Thread Jarek Poplawski
On Wed, Jan 16, 2008 at 11:17:08AM +1100, Herbert Xu wrote: ... Well people are always going to operate on this model for commercial reasons. FWIW I used to work for a company that stuck to a specific version of the Linux kernel, and I suppose I still do even now :) But the important thing

Re: questions on NAPI processing latency and dropped network packets

2008-01-14 Thread Chris Friesen
Ray Lee wrote: On Jan 10, 2008 9:24 AM, Chris Friesen [EMAIL PROTECTED] wrote: After a recent userspace app change, we've started seeing packets being dropped by the ethernet hardware (e1000, NAPI is enabled). The error/dropped/fifo counts are going up in ethtool: Can you reproduce it

Re: questions on NAPI processing latency and dropped network packets

2008-01-14 Thread Chris Friesen
David Miller wrote: From: Chris Friesen [EMAIL PROTECTED] Date: Fri, 11 Jan 2008 08:59:26 -0600 I'd love to work on newer kernels, but we have a commitment to our customers to support multiple releases for a significant amount of time. And by asking here for people to dig into it for you,

Re: questions on NAPI processing latency and dropped network packets

2008-01-14 Thread Eric Dumazet
Chris Friesen a écrit : Ray Lee wrote: On Jan 10, 2008 9:24 AM, Chris Friesen [EMAIL PROTECTED] wrote: After a recent userspace app change, we've started seeing packets being dropped by the ethernet hardware (e1000, NAPI is enabled). The error/dropped/fifo counts are going up in ethtool:

Re: questions on NAPI processing latency and dropped network packets

2008-01-14 Thread Eric Dumazet
Chris Friesen a écrit : Eric Dumazet wrote: Chris Friesen a écrit : Based on profiling and instrumentation it seems like the cost of sctp_endpoint_lookup_assoc() more than triples, which means that the amount of time that bottom halves are disabled in that function also triples. Any idea

Re: questions on NAPI processing latency and dropped network packets

2008-01-14 Thread Chris Friesen
Eric Dumazet wrote: Chris Friesen a écrit : Based on profiling and instrumentation it seems like the cost of sctp_endpoint_lookup_assoc() more than triples, which means that the amount of time that bottom halves are disabled in that function also triples. Any idea of the size of sctp hash

Re: questions on NAPI processing latency and dropped network packets

2008-01-14 Thread Chris Friesen
Eric Dumazet wrote: Chris Friesen a écrit : Based on the profiling information we're spending time in sctp_endpoint_lookup_assoc() which doesn't actually use hashes, so I can't see how the hash would be related. I'm pretty new to SCTP though, so I may be missing something. Well, it does

Re: questions on NAPI processing latency and dropped network packets

2008-01-14 Thread Jarek Poplawski
On 14-01-2008 16:58, Chris Friesen wrote: ... How close to bleeding edge do we need to be for it to be considered acceptable to ask questions on netdev? Given that the embedded space tends to be perpetually stuck on older kernels (our current release is based on 2.6.14) do you have any

Re: questions on NAPI processing latency and dropped network packets

2008-01-11 Thread Chris Friesen
David Miller wrote: You have to be kidding, coming here for help with a nearly 4 year old kernel. I figured it couldn't hurt to ask...if I can't ask the original authors, who else is there? I'd love to work on newer kernels, but we have a commitment to our customers to support multiple

Re: questions on NAPI processing latency and dropped network packets

2008-01-11 Thread Herbert Xu
Chris Friesen [EMAIL PROTECTED] wrote: I'd love to work on newer kernels, but we have a commitment to our customers to support multiple releases for a significant amount of time. Since you've made the commitment, you should stick to it and resolve the issues without asking us to contribute.

Re: questions on NAPI processing latency and dropped network packets

2008-01-11 Thread David Miller
From: Chris Friesen [EMAIL PROTECTED] Date: Fri, 11 Jan 2008 08:59:26 -0600 I'd love to work on newer kernels, but we have a commitment to our customers to support multiple releases for a significant amount of time. And by asking here for people to dig into it for you, you are asking people

Re: questions on NAPI processing latency and dropped network packets

2008-01-11 Thread Ray Lee
On Jan 10, 2008 9:24 AM, Chris Friesen [EMAIL PROTECTED] wrote: After a recent userspace app change, we've started seeing packets being dropped by the ethernet hardware (e1000, NAPI is enabled). The error/dropped/fifo counts are going up in ethtool: (These are perhaps too obvious, but I

questions on NAPI processing latency and dropped network packets

2008-01-10 Thread Chris Friesen
Hi all, I've got an issue that's popped up with a deployed system running 2.6.10. I'm looking for some help figuring out why incoming network packets aren't being processed fast enough. After a recent userspace app change, we've started seeing packets being dropped by the ethernet hardware

Re: questions on NAPI processing latency and dropped network packets

2008-01-10 Thread Kok, Auke
Chris Friesen wrote: Hi all, I've got an issue that's popped up with a deployed system running 2.6.10. I'm looking for some help figuring out why incoming network packets aren't being processed fast enough. After a recent userspace app change, we've started seeing packets being dropped

Re: questions on NAPI processing latency and dropped network packets

2008-01-10 Thread Chris Friesen
Kok, Auke wrote: You're using 2.6.10... you can always replace the e1000 module with the out-of-tree version from e1000.sf.net, this might help a bit - the version in the 2.6.10 kernel is very very old. Do you have any reason to believe this would improve things? It seems like the problem

Re: questions on NAPI processing latency and dropped network packets

2008-01-10 Thread James Chapman
Chris Friesen wrote: Hi all, I've got an issue that's popped up with a deployed system running 2.6.10. I'm looking for some help figuring out why incoming network packets aren't being processed fast enough. After a recent userspace app change, we've started seeing packets being dropped

Re: questions on NAPI processing latency and dropped network packets

2008-01-10 Thread Rick Jones
1) Interrupts are being processed on both cpus: [EMAIL PROTECTED]:/root cat /proc/interrupts CPU0 CPU1 30:17037564530785 U3-MPIC Level eth0 IIRC none of the e1000 driven cards are multi-queue, so while the above shows that interrupts from eth0 have been

Re: questions on NAPI processing latency and dropped network packets

2008-01-10 Thread Kok, Auke
Chris Friesen wrote: Kok, Auke wrote: You're using 2.6.10... you can always replace the e1000 module with the out-of-tree version from e1000.sf.net, this might help a bit - the version in the 2.6.10 kernel is very very old. Do you have any reason to believe this would improve things? It

Re: questions on NAPI processing latency and dropped network packets

2008-01-10 Thread Kok, Auke
Rick Jones wrote: 1) Interrupts are being processed on both cpus: [EMAIL PROTECTED]:/root cat /proc/interrupts CPU0 CPU1 30:17037564530785 U3-MPIC Level eth0 IIRC none of the e1000 driven cards are multi-queue the pci-express variants are, but the

Re: questions on NAPI processing latency and dropped network packets

2008-01-10 Thread Chris Friesen
James Chapman wrote: What's changed in your application? Any real-time threads in there? From the top output below, looks like SigtranServices is consuming all your CPU... There are two cpus, and SigtranServices is multithreaded with many threads. Most of these threads are affined to

Re: questions on NAPI processing latency and dropped network packets

2008-01-10 Thread David Miller
From: Chris Friesen [EMAIL PROTECTED] Date: Thu, 10 Jan 2008 11:24:19 -0600 I've got an issue that's popped up with a deployed system running 2.6.10. ... So...anyone have any ideas/suggestions? You have to be kidding, coming here for help with a nearly 4 year old kernel. The networking