If I disable sampling, I can see my HW counters dropping flows all over the place and the graphs start to look real ugly like. 1:1 + drops is clearly not in the cards ;)
On 2013-01-11, at 2:57 AM, Adrian Popa <adrian.popa...@gmail.com> wrote: > Is there a way to measure how many flows you are dropping? Assuming you set > sampling to 1:1, the only indication I saw about hardware resources being > exhausted was (on a cisco box) getting syslog messages that the TCAM memory > was exhausted, but I didn't get any views on how many flows were being > dropped. Keep in mind that dropped flows are happening in hardware - before > being exported as netflow, so netflow counters wouldn't help. > > Indeed, the solution is to slowly increase the sampling until you get no more > warnings, but this should probably be adjusted keeping in mind your maximum > traffic values (usually your maximum pps), so the analysis needs to be done > in peak traffic hours. > > The difference in data accuracy would probably be this - with sampling you > have some control in selecting which packets you count (e.g. 1 packet in 200, > or time-based (on some platforms): all packets in a 64ms window every 512ms). > You can then estimate mathematically what the original traffic was (but since > you saw 1 packet in 200, you don't really know if the other 199 packets > belonged to the same flow or not), so it doesn't give you an accurate picture. > > Without sampling, you're pretty much on your own - you won't be able to make > this mathematical correction because you don't know you are seeing 1 packet > in 200 or which packet you are seeing. > > Sampling gives you the ability to make a better estimation of your traffic > (you control the periods of reading) than non-sampling + dropping. > > If you know you are dropping just .002% of your flows, than 1:1 is better > than sampling, but the problem is - you won't know how much traffic you're > dropping. > > @Peter: I agree, nfsen shouldn't need to compete with Arbor Peakflow. For a > lot of things it works better than Arbor Peakflow - e.g. it actually stores > the flows, and gives you the ability to manipulate them through plugins. I > also agree that adding SNMP-based correction would over-complicate things, > but it was something I had seen done and I wanted to let you know about it in > case you weren't aware of it. > > > On Thu, Jan 10, 2013 at 10:11 PM, Jason Lixfeld > <jason-nfsen-disc...@lixfeld.ca> wrote: > Thanks all for your replies. So I guess for what I'm looking for, dropping > the sample rate to the point where it just hovers on the HW limits of my > platform is probably where I want to be. > > So hypothetically speaking, what's the difference between sampling and not > sampling and letting the router drop flows if it bumps up against it's > hardware limit? I guess that depends on how far over the HW limit your flows > go - that is, if you are dropping 5% of your flows your numbers might be > kinda messy, but if you are dropping .002% of your flows, that might not be a > bad compromise to 1:1 sampling? > > On 2013-01-10, at 3:05 PM, Peter Haag <ph...@users.sourceforge.net> wrote: > > > Sampling is indeed tricky and Adrian explained the facts pretty good. > > > > Just to make a few additions and remarks, also to the mail from Jason. > > This is btw. to the best of my knowledge, and I'm glad for any other > > input which may help. > > > > @Jason: "Sampling doesn't quelch the number of flow records" > > This is not correct in my view. You miss small flows which entirely fit > > within your sampling rate. The smaller the flows the higher you sample, > > the more you loose. To my knowledge, there is no simple and accurate way > > to properly calculate or "guess" the number of flows. That's the reason > > nfdump does not touch the number of flows. It does apply the sampling rate > > to packets and bytes, as this is a reasonable approach. The question is > > how close you come. > > > > Furthermore the shorter a flow is, the less accurate are it's bps and pps > > as nfdump calculates those, based on the 'corrected estimated values' > > > > @Adrian: NfSen was never built to be a competing netflow product to > > Arbor Peakflow :) > > If we would do SNMP queries, this would basically give you an estimate > > about the accuracy for overall packets and bytes. While this is not > > impossible to implement in NfSen would it more help or more asking for > > trouble? It would require you to configure your netflow settings exactly > > for the interface in question, or you would need to make a per interface > > evaluation. > > > > Any other suggestions? I'm open to for new ideas, to make nfdump of better > > use, also for sampled flows. > > > > - Peter > > > > > > On 10/1/13 8:14 AM, Adrian Popa wrote: > >> Depending on what you want to use the netflow data for, sampling could be > >> low or high. Low sampling gives you more accurate data for a specific flow, > >> high sampling can give you some average data for the whole box. The more > >> details you want, the lower the sampling has to be. > >> > >> Keep in mind one more fact - newer versions of nfsen do "sampling > >> correction" - meaning, it can detect the sampling rate (it's normally > >> exported by the router), and adjusts the flow values according to this > >> sample rate. > >> > >> In your case, I would say that nfsen received a flow record with only 2 > >> packets, with duration 16ms and based on sampling 1:1000 it "adjusted" it > >> to 2000 packets. I'm pretty sure the traffic volume is adjusted as well. > >> This can be misleading for small traffic values, but you can generally > >> exclude these by filtering for flows with duration > 1000 (at least 1 > >> second). > >> > >> To disable this sampling correction, you would need to start your collector > >> with -s -1 parameter (set sampling to 1), but your graphs would probably be > >> 1000 times smaller in values. :) > >> > >> @Peter: I know sampling is tricky, but I've noticed an option in a > >> competing netflow product - Arbor Peakflow - that can get better results. > >> They also read the sampling values exported by the router, but make > >> periodic SNMP queries to read the traffic values on exporting interfaces. > >> They then try to see if the netflow traffic seen on router X, interface Y > >> matches the SNMP traffic for the same router and interface. If the ratio is > >> close to 1:1, then their sampling correction is ok. If it's offset, either > >> they are not getting the whole netflow traffic for that interface, or they > >> are not correcting it correctly. I think that in this case, they > >> dynamically change the sampling rate in their corrections in order to make > >> the two readings match... > >> > >> This would involve quite a few changes in nfsen, and would probably annoy > >> router administrators (nobody wants yet another management app to read SNMP > >> values from an overloaded router), but might be something worth considering > >> in the future if this gets out of hand. > >> > >> Regards, > >> Adrian > >> > >> > >> On Thu, Jan 10, 2013 at 12:12 AM, Jason Lixfeld < > >> jason-nfsen-disc...@lixfeld.ca> wrote: > >> > >>> > >>> On 2013-01-07, at 2:03 AM, Adrian Popa <adrian.popa...@gmail.com> wrote: > >>> > >>>> If you are worried instead about the low volume of traffic seen from > >>> this AS, keep in mind the following: > >>>> 1. You are probably using sampling on your router. NFSEN accounts for > >>> sampling and tries to guesstimate some of the values. > >>> > >>> I am sampling. 1:1000. > >>> > >>> Maybe I don't quite understand sampling. Sampling doesn't quelch the > >>> number of flow records exported to the collector, it quelches the number > >>> of > >>> packets that are processed by the device in order to create the flow > >>> record. Is that accurate? > >>> > >>> So I just re-ran the math from the output below. Let's take this one for > >>> argument's sake: > >>> > >>> 2013-01-03 10:10:43.424 0.016 any 30513 2( 0.0) > >>> 2000( 0.0) 3.0 M( 0.0) 125000 1.5 G 1500 > >>> > >>> So what that is saying is that the statistic entry for AS30513 was first > >>> seen on 2013-01-03 10:10:43.424, consists of 16ms worth of data where 2 > >>> flows totalling 3MB of data volume spread across 2000 packets was > >>> collected > >>> within those 16ms. The flow records have no knowledge of pps, bps or bpp, > >>> so nfdump calculates those values based on the data that it knows about; > >>> time (16ms), volume (3MB) and total number of packets based on the > >>> exported > >>> flow records received by nfcapd. > >>> > >>> So if this is true, then trying to use bps as a statistic orderby will > >>> never provide you with decent results because those values are calculated > >>> based on data that might have been quelched based on the way the sampling > >>> works. > >>> > >>> If this is correct, it seems to me like sampling is bad (but I can't > >>> actually not sample or else my routers drop netflow packets; they can only > >>> handle 100k across the entire box), but I understand why it exists. So if > >>> sampling is the root cause of all these "bad" calculations, it would stand > >>> to reason that one should set the sampling rate as close to 1:1 as > >>> possible? > >>> > >>>> 2. You may have some spoofed traffic in your network that sends few > >>> packets (hence the very short duration), but because of sampling, you get > >>> a > >>> high count of packets (and usually this is a "round" number). > >>>> > >>>> On Sat, Jan 5, 2013 at 9:44 AM, Peter Haag <ph...@users.sourceforge.net> > >>> wrote: > >>>> Hi Jason, > >>>> Looking at your output, I can not find something weird. Please keep in > >>> mind: > >>>> Each flow has two ASes, so and so see on how many flows these ASes > >>> appear. > >>>> Your second example makes it clear: You filter for 'as 30513' which > >>> results > >>>> in two flows - AS 30513 <-> AS 0. AS 0 means the exporting router has no > >>> AS > >>>> info. These resulting two flows are now ordered by AS and by bps as > >>> requested. > >>>> Each AS appears in each flow -> in 100% of all flows. > >>>> > >>>> The same math is now applied for your first run. But you only have the > >>> flows > >>>> of the first top 10 ASes by bps. In % the digits are way below what can > >>> be > >>>> displayed. You may also use -N to prevent scaling (K, M, G, T) in order > >>> to > >>>> see the actual number. To sum up, you would need to output of all seen > >>> ASes > >>>> -n 0 . > >>>> > >>>> Hope, this helps, otherwise let me know, if I can help > >>>> > >>>> - Peter > >>>> > >>>> On 4/1/13 5:20 PM, Jason Lixfeld wrote: > >>>>> Hi there, > >>>>> > >>>>> So I'm just playing around with my first 36 hours worth of data and > >>> I'm seeing some stuff that looks sort of off: > >>>>> > >>>>> ** nfdump -M > >>> /opt/nfsen/profiles-data/live/bfr01-hudson:bfr01-mowat:bfr01-front -T -R > >>> 2013/01/02/nfcapd.201301022305:2013/01/04/nfcapd.201301041055 -n 10 -s > >>> as/bps > >>>>> nfdump filter: > >>>>> any > >>>>> Top 10 AS ordered by bps: > >>>>> Date first seen Duration Proto AS Flows(%) > >>> Packets(%) Bytes(%) pps bps bpp > >>>>> 2013-01-02 22:39:46.290 130797.681 any 0 21.1 > >>> M(85.9) 42.2 G(87.5) 30.0 T(88.5) 322585 1.8 G 710 > >>>>> 2013-01-03 10:10:43.424 0.016 any 30513 2( > >>> 0.0) 2000( 0.0) 3.0 M( 0.0) 125000 1.5 G 1500 > >>>>> 2013-01-03 08:53:20.734 0.015 any 37957 2( > >>> 0.0) 2000( 0.0) 1.5 M( 0.0) 133333 810.7 M 760 > >>>>> 2013-01-04 10:23:02.606 0.017 any 35414 2( > >>> 0.0) 2000( 0.0) 1.5 M( 0.0) 117647 727.5 M 773 > >>>>> 2013-01-03 14:25:51.067 0.017 any 33428 2( > >>> 0.0) 2000( 0.0) 1.5 M( 0.0) 117647 692.7 M 736 > >>>>> 2013-01-03 13:37:35.176 0.039 any 46676 1( > >>> 0.0) 2000( 0.0) 2.8 M( 0.0) 51282 582.6 M 1420 > >>>>> 2013-01-04 00:43:04.529 0.048 any 15347 1( > >>> 0.0) 2000( 0.0) 2.8 M( 0.0) 41666 473.3 M 1420 > >>>>> 2013-01-03 15:58:33.535 0.077 any 47045 1( > >>> 0.0) 3000( 0.0) 4.3 M( 0.0) 38961 442.6 M 1420 > >>>>> 2013-01-02 23:02:16.952 129445.016 any 22822 4.0 > >>> M(16.2) 8.9 G(18.5) 6.4 T(19.0) 68835 398.2 M 723 > >>>>> 2013-01-03 14:52:54.865 0.031 any 19354 2( > >>> 0.0) 2000( 0.0) 1.5 M( 0.0) 64516 379.9 M 736 > >>>>> > >>>>> Summary: total flows: 24583165, total bytes: 33.9 T, total packets: > >>> 48.2 G, avg bps: 2.1 G, avg pps: 368688, avg bpp: 702 > >>>>> Time window: 2013-01-02 22:39:34 - 2013-01-04 10:59:43 > >>>>> Total flows processed: 24583165, Blocks skipped: 0, Bytes read: > >>> 2261849088 > >>>>> Sys: 8.970s flows/second: 2740403.8 Wall: 10.563s flows/second: > >>> 2327242.5 > >>>>> > >>>>> Lines 1 and 9 seem OK, but lines 2-8,10 look really weird; the math > >>> just doesn't add up. > >>>>> > >>>>> If I filter specifically on AS 30513: > >>>>> > >>>>> ** nfdump -M > >>> /opt/nfsen/profiles-data/live/bfr01-hudson:bfr01-mowat:bfr01-front -T -R > >>> 2013/01/02/nfcapd.201301022305:2013/01/04/nfcapd.201301041055 -n 10 -s > >>> as/bps > >>>>> nfdump filter: > >>>>> AS 30513 > >>>>> Top 10 AS ordered by bps: > >>>>> Date first seen Duration Proto AS Flows(%) > >>> Packets(%) Bytes(%) pps bps bpp > >>>>> 2013-01-03 10:10:43.424 0.016 any 0 > >>> 2(100.0) 2000(100.0) 3.0 M(100.0) 125000 1.5 G 1500 > >>>>> 2013-01-03 10:10:43.424 0.016 any 30513 > >>> 2(100.0) 2000(100.0) 3.0 M(100.0) 125000 1.5 G 1500 > >>>>> > >>>>> Summary: total flows: 2, total bytes: 3.0 M, total packets: 2000, avg > >>> bps: 1.5 G, avg pps: 125000, avg bpp: 1500 > >>>>> Time window: 2013-01-02 22:39:34 - 2013-01-04 10:59:43 > >>>>> Total flows processed: 24583165, Blocks skipped: 0, Bytes read: > >>> 2261849088 > >>>>> Sys: 7.574s flows/second: 3245367.9 Wall: 8.594s flows/second: > >>> 2860278.3 > >>>>> > >>>>> I have no idea how to even begin going about troubleshooting this, so > >>> any thoughts are welcomed. > >>>>> > >>>>> Thanks again in advance. > >>>>> > >>> ------------------------------------------------------------------------------ > >>>>> Master HTML5, CSS3, ASP.NET, MVC, AJAX, Knockout.js, Web API and > >>>>> much more. Get web development skills now with LearnDevNow - > >>>>> 350+ hours of step-by-step video tutorials by Microsoft MVPs and > >>> experts. > >>>>> SALE $99.99 this month only -- learn more at: > >>>>> http://p.sf.net/sfu/learnmore_122812 > >>>>> _______________________________________________ > >>>>> Nfsen-discuss mailing list > >>>>> Nfsen-discuss@lists.sourceforge.net > >>>>> https://lists.sourceforge.net/lists/listinfo/nfsen-discuss > >>>>> > >>>> > >>>> -- > >>>> Be nice to your netflow data. Use NfSen and nfdump :) > >>>> > >>>> > >>> ------------------------------------------------------------------------------ > >>>> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, > >>>> MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current > >>>> with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft > >>>> MVPs and experts. SALE $99.99 this month only -- learn more at: > >>>> http://p.sf.net/sfu/learnmore_122912 > >>>> _______________________________________________ > >>>> Nfsen-discuss mailing list > >>>> Nfsen-discuss@lists.sourceforge.net > >>>> https://lists.sourceforge.net/lists/listinfo/nfsen-discuss > >>>> > >>> > >>> > >> > > > > -- > > Be nice to your netflow data. Use NfSen and nfdump :) > > ------------------------------------------------------------------------------ Master HTML5, CSS3, ASP.NET, MVC, AJAX, Knockout.js, Web API and much more. Get web development skills now with LearnDevNow - 350+ hours of step-by-step video tutorials by Microsoft MVPs and experts. SALE $99.99 this month only -- learn more at: http://p.sf.net/sfu/learnmore_122812 _______________________________________________ Nfsen-discuss mailing list Nfsen-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfsen-discuss