Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

Andrew Morton Sat, 17 Jun 2006 16:56:26 -0700

On Sat, 17 Jun 2006 16:23:34 -0700
Harry Edmon <[EMAIL PROTECTED]> wrote:


> Andrew Morton wrote:
> > On Fri, 16 Jun 2006 09:01:23 -0700
> > Harry Edmon <[EMAIL PROTECTED]> wrote:
> > 
> >> I have a system with a strange network performance degradation from 
> >> 2.6.11.12 to most recent kernels including 2.6.16.20 and 2.6.17-rc6.   
> >> The system is has Dual single core Xeons with hyperthreading on.   The 
> >> application is the LDM system from UCAR/Unidata 
> >> (http://www.unidata.ucar.edu/software/ldm).   This system requests 
> >> weather data from a variety of systems using RPC calls over a reserved 
> >> TCP port (388), puts them into a memory mapped queue file, and then 
> >> sends the data out to a variety of downstream requesting systems, again 
> >> using RPC calls.  When the load is heavy, the 2.6.16.20 kernel falls way 
> >> behind with the data ingestion.  The 2.6.11.12 kernel does not.   I have 
> >> tried an experiment with a 2.6.17-rc6 system where it just does the 
> >> ingestion, and not the downstream distribution, and it is able to keep 
> >> up.   I would really appreciate any pointers as to where the problem may 
> >> be and how to diagnose it.  I have attached the config files from both 
> >> kernels and the sysctl.conf file I am using.   I have also included the 
> >> output from "netstat -s" on the 2.6.16.20 system during a time when it 
> >> was having problems.
> >>
> > 
> > (added netdev)
> > 
> > A quick grep indicates that it isn't using TCP_NODELAY - we've had problems
> > with that in the past.
> > 
> > Perhaps a tcpdump of the net traffic will help to determine what's going on.
> 

[ edit, edit - please don't top-post ]

> I assume you are talking about using TCP_NODELAY as a socket option within 
> the 
> LDM software.  I could give that a try.

The use of TCP_NODELAY caused problems with the JVM debugger.  I'm not
suggesting that enabling it will fix anything here.

> 
> There is a lot of traffic on this node, on the order of 2000 packets in and 
> out 
> per second, so the tcpdump output will grow pretty fast.  How long a tcpdump 
> would be useful, and what options would you suggest?

I don't know, frankly - first one needs to develop some sort of theory,
then use the diagnostic tools to prove or disprove that theory.  And I
don't have a theory.

I guess a simple one-second bare `tcpdump -i eth0' would be a starting
point.  Perhaps compare the output of that with the output from a
correctly-operating kernel, see if anything suggests itself.  That might
also give us something which the networking developers can use.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

Reply via email to