A number of years ago I had written a comprehensive system monitoring tool called collectl, which among other things allowed me to monitor network traffic in real-time display as well as log the data to a file. Furthermore, that file can be generated in a format suitable for plotting with gnuplot. As it turned out, I would very frequently see spikes of 200MB/sec on my 1Gb link. A colleague noticed the reason was because the network counters were being updated every 0.9765 seconds and this was causing the problem. I don't know how long this problem existed but it was certainly there in 2.4 kernels. As it turns out, my tool is capable of monitoring with a fractional frequency and I have been able to get good data in spite of this problem. However, I've since noticed that now the stats are updated once a second but that also means when I process the data at 0.9765 I get the wrong numbers again. Clearly one answer is to just update the counters more frequently but I suspect that is not being done for reasons of performance.

Anyhow, I just wanted to let people know that ALL tools that monitor once a second on older counters will get the wrong numbers and tools that correct for the wrong number by using fractional intervals (and I suspect mine is the only one that does) but run on newer kernels will also get the wrong numbers. In any event, if anyone is interested in trying out collectl - it monitors a LOT more than just networks - you can snag a copy of from http://collectl.sourceforge.net/ if you'd like to take if for a drive. The website has a lot of output examples to give you a better idea what it can do. I even included a writeup about the odd network performance observations at http://collectl.sourceforge.net/NetworkStats.html

-mark


-
To unsubscribe from this list: send the line "unsubscribe linux-net" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to