Peter Phaal
Fri, 22 Jul 2005 11:25:17 -0700
Hi Everyone, Since my paper "Packet Sampling Basics" was referenced in this discussion, I thought it might be helpful if I clarified a couple of misconceptions about sampling and sFlow. http://www.sflow.org/packetSamplingBasics/ Firstly, sFlow does use random sampling. If you set a sampling rate of 1/512 on an sFlow enabled device then it not simply take every 512th packet, but will instead randomly sample with an average of one sample for every 512 packets. I agree that non-random 1/N sampling is a bad idea. Cisco has recently added the ability to randomly sample to its NetFlow implementation. If you cannot use random sampling, I would recommend picking a prime number for N. The second misconception is that you have to make assumptions about the distribution of packets on the wire in order to make random packet sampling accurate. Packets on the wire can be fractal, heavy-tailed, bursty, uniform or any other distribution that you can imagine, and the estimates produced by random packet sampling will still converge to the correct values. The confusion seems to arise out of the use of the "Central Limit Theorem". The following paper provides a good illustration of the Central Limit Theorem: "The distribution of an average tends to be Normal, even when the distribution from which the average is computed is decidedly non-Normal" http://www.statisticalengineering.com/central_limit_theorem.htm If you are a skeptic and do not trust the statistical theory, you can easily convince yourself that sampling works by performing the following experiment (assuming a basic familiarity with tcpdump and Perl): 1. Capture a full packet trace using tcpdump. 2. Run the trace through a script and compute a statistic (say the fraction of TCP vs. UDP packets in the trace). 3. Now randomly select a fraction of the packets in the original trace and compute the same statistic based only on the samples (Perl has a built in random number function rand(), testing for rand() < 0.1 would sample 1 in 10 packets). 4. Repeat step 3 a number of times (making sure you re-seed the random number generator each time). 5. You will find that the estimates you calculate in step 3 are scattered around the true value calculated in step 2. The estimates will be normally distributed around the correct mean (according to the equations in Packet Sampling Basics paper). If you have access to an sFlow capable device: http://www.sflow.org/products/network.php you could use sFlowTrend http://www.inmon.com/products/sFlowTrend.php to look at traffic through the switch. If you generate a constant stream of packets through the switch (using ping -f for example) you will see that successive measurements of the flow will vary around its correct value. Packet sampling has been around for a long time and its accuracy is well understood. Sampling has been successfully applied to a wide variety of applications, even to usage based billing where accuracy is particularly important. Peter -----Original Message----- Most sampling - including the sFlow paper - assumes a normal distribution (ye olde bell curve). Thus you can establish confidence intervals for your sample as long as the sampling is truly random. Two flaws: (1) 1/n sampling is NOT random. (2) There are studies that show that internet traffic is in fact better described as fractal. At that point, whether sampling works is up for grabs. Still, a sampling based methodology will give you SOME indications about your data. What you make of it is up to you. Just as you can always push an analogy too far, you can push samples too far. -----Burton WRT (1), 1/n where n is even really bothers me, since lots of protocols are asymmetric, favoring one direction or the other. With these, perhaps after an initial request, the traffic becomes: (large packet of data) -> (small ack packet) <- Which means you see only one type in your 1/n sample. Is this a real problem? No clue - but keep an eye on sampled outputs and be willing to adjust if it seems counter to your real network experience. netFlow also offers some aggregation options (http://www.cisco.com/univercd/cc/td/doc/product/software/ios120/120newft/12 0t/120t3/netflow.htm). You would have to add v8 to the plugin, but that doesn't look that hard (first glance) as the v8 packet is described as a subset of v5 which ntop does support. -----Original Message----- From: ntop-bounces at unipi.it [mailto:ntop-bounces at unipi.it] On Behalf Of Chris Moore Sent: Tuesday, July 12, 2005 7:24 AM To: ntop at Unipi.IT Subject: RE: [Ntop] netflow sampling rate Why sampling is not fatally flawed: http://www.sflow.org/packetSamplingBasics/index.htm Of course this only works if you can except some degree of inaccuracy and if sufficient data is sent/received. With 1/64 sampling rate you can achieve very good accuracy. But if you're looking for a single packet or a couple packets an hour or something like that, you won't see it accurately. I was talking with Luca some time back about adding a multiplier function like Ryan describes, but I don't know if any progress has been made or even if this is still on his radar...... Chris -----Original Message----- From: ntop-bounces at unipi.it [mailto:ntop-bounces at unipi.it] On Behalf Of Burton Strauss Sent: Monday, July 11, 2005 4:40 PM To: ntop at Unipi.IT Subject: RE: [Ntop] netflow sampling rate I don't think you can - sampling is usually a function of the COLLECTOR not the receiver (ntop). Scaling is very different than sampling, btw. And any assumption that 1/64th the traffic is a good metric for 1/1th the traffic is probably fatally flawed unless you really, really understand your traffic. -----Burton -----Original Message----- From: ntop-bounces at unipi.it [mailto:ntop-bounces at unipi.it] On Behalf Of Brgomeistr Sent: Monday, July 11, 2005 3:50 PM To: ntop at listgateway.unipi.it Subject: [Ntop] netflow sampling rate How do I set the sampling rate for a netflow interface? I have created a netflow interface (NetFlow-device.2) using the netflow device plugin, but am unable to find a way to set the sampling rate for this interface. I see that you can modify the sampling rate for local interface using "Configure > Startup Options". I am currently exporting netflow data (at a sampled rate of 1 out of every 64 packets) from a router to the ntop server. This data must be scaled (x64) to properly indicate the amount of traffic flowing through the router's interface. Is there a way to do this via the netflow collector included with ntop? Thanks, -ryan _______________________________________________ Ntop mailing list Ntop@unipi.it http://listgateway.unipi.it/mailman/listinfo/ntop