On Tue, 2008-06-03 at 10:40 +1200, Steve Shipway wrote: > > A two second interval is extremely short! 8-o > > I would suggest you check the obvious firstly > - Are you using SNMPv2? If not, do so, if possible.
I tried it both with and without SNMPv2. There was no perceptible difference between the graphs. > - Are you generating so much test traffic that the SNMP packets are being > dropped? > - With a 2sec interval, this can mean that the interval is smaller than the > SNMP timeout or retries time. Any delay would cause data to be skipped, and > possibly interpolated or set to zero (do you have unknaszero set?) Maybe > your MRTG server has slow disks that cannot keep up with the IO stream and it > needs to freeze occasionally to flush the output buffer, missing data polls? I verified with Wireshark that for every SNMP request sent out a response was received in a timely fashion (~1ms) > I am guessing that the odd dips are when the counter wraps around, or rather > when the MRTG or RRD code thinks it /might/ have wrapped around. Setting to > SNMPv2 will make this less frequent and less likely, although a 2sec poll is > unlikely to be wrapping until some crazy number of gigabits per second. > Maybe the MRTG wrap detection code gets a bit dodgy at these high poll > frequencies? That was my initial reaction, but that doesn't seem to be the case. At 11.6Mbps, the rollover on the octet counters should occur on the order of hours, not seconds or minutes! Also, with a constant data stream, I would expect any dips due to rollover error to occur at a fairly regular interval, and this was definitely not the case (see description of symptoms below). > If using SNMPv2 makes the dips disappear or occur less often, then it is > probably a wraparound-detection error. Similarly, if the dips disappear with > lower poll frequencies then it might be because the normalisation routines > get upset then the buckets are so small? I'd need to pore over the code for > hours to deduce any possible misbehaviour when the interval is so small. > Hope this helps, > > Steve Not being at all familiar with the internals of MRTG, I won't speculate on the specifics, but the observed behavior was when the dips on the graph were appearing, MRTG appeared to be sending an 'extra' request inside an interval. eg, with 2 second intervals: SNMP-GET requests at 0s, 2s, 4s, 6s, 7s, 8s, 10s, etc. The dip in the graph would correspond to the 6s-7s-8s sending event, and dip to a factor of 1/2 of the expected rate. eg, with 1 second intervals: SNMP-GET requests at 0.0s, 1.0s, 2.0s, 2.1s, 3.0s, 4.0s, etc. The dips in the graph would correspond to the 2.0s-2.1s-3.0s sending event and dip to a factor of 1/10th of the expected rate. The router responds to each request in a timely (~1ms) fashion. The relation of the timing of the errant request to the interval size and the dip in the graph is what really intrigues me. The factors that affected the dips appeared to be interval length (longer intervals = less dips and proportionally shallower dips), and number of OID pairs being polled (less OID pairs = less dips). I unfortunately do not have any data to include with this email as I spent the last several days working around this problem with a custom polling script in perl. I will take some time later this week or this weekend to reproduce this problem and send you Wireshark captures and corresponding graphs from the rrd data to better illustrate this phenomenon. If there is any other data you think might be useful for me to include, please let me know. :) Thank you for your time! J.Williams _______________________________________________ mrtg mailing list [email protected] https://lists.oetiker.ch/cgi-bin/listinfo/mrtg
