Re: [Nfsen-discuss] Troubleshooting some odd looking nfdump output

Jason Lixfeld Fri, 11 Jan 2013 03:34:53 -0800

If I disable sampling, I can see my HW counters dropping flows all over the 
place and the graphs start to look real ugly like.  1:1 + drops is clearly not 
in the cards ;)


On 2013-01-11, at 2:57 AM, Adrian Popa <adrian.popa...@gmail.com> wrote:

> Is there a way to measure how many flows you are dropping? Assuming you set 
> sampling to 1:1, the only indication I saw about hardware resources being 
> exhausted was (on a cisco box) getting syslog messages that the TCAM memory 
> was exhausted, but I didn't get any views on how many flows were being 
> dropped. Keep in mind that dropped flows are happening in hardware - before 
> being exported as netflow, so netflow counters wouldn't help.
> 
> Indeed, the solution is to slowly increase the sampling until you get no more 
> warnings, but this should probably be adjusted keeping in mind your maximum 
> traffic values (usually your maximum pps), so the analysis needs to be done 
> in peak traffic hours.
> 
> The difference in data accuracy would probably be this - with sampling you 
> have some control in selecting which packets you count (e.g. 1 packet in 200, 
> or time-based (on some platforms): all packets in a 64ms window every 512ms). 
> You can then estimate mathematically what the original traffic was (but since 
> you saw 1 packet in 200, you don't really know if the other 199 packets 
> belonged to the same flow or not), so it doesn't give you an accurate picture.
> 
> Without sampling, you're pretty much on your own - you won't be able to make 
> this mathematical correction because you don't know you are seeing 1 packet 
> in 200 or which packet you are seeing.
> 
> Sampling gives you the ability to make a better estimation of your traffic 
> (you control the periods of reading) than non-sampling + dropping.
> 
> If you know you are dropping just .002% of your flows, than 1:1 is better 
> than sampling, but the problem is - you won't know how much traffic you're 
> dropping.
> 
> @Peter: I agree, nfsen shouldn't need to compete with Arbor Peakflow. For a 
> lot of things it works better than Arbor Peakflow - e.g. it actually stores 
> the flows, and gives you the ability to manipulate them through plugins. I 
> also agree that adding SNMP-based correction would over-complicate things, 
> but it was something I had seen done and I wanted to let you know about it in 
> case you weren't aware of it.
> 
> 
> On Thu, Jan 10, 2013 at 10:11 PM, Jason Lixfeld 
> <jason-nfsen-disc...@lixfeld.ca> wrote:
> Thanks all for your replies.  So I guess for what I'm looking for, dropping 
> the sample rate to the point where it just hovers on the HW limits of my 
> platform is probably where I want to be.
> 
> So hypothetically speaking, what's the difference between sampling and not 
> sampling and letting the router drop flows if it bumps up against it's 
> hardware limit?  I guess that depends on how far over the HW limit your flows 
> go - that is, if you are dropping 5% of your flows your numbers might be 
> kinda messy, but if you are dropping .002% of your flows, that might not be a 
> bad compromise to 1:1 sampling?
> 
> On 2013-01-10, at 3:05 PM, Peter Haag <ph...@users.sourceforge.net> wrote:
> 
> > Sampling is indeed tricky and Adrian explained the facts pretty good.
> >
> > Just to make a few additions and remarks, also to the mail from Jason.
> > This is btw. to the best of my knowledge, and I'm glad for any other
> > input which may help.
> >
> > @Jason: "Sampling doesn't quelch the number of flow records"
> > This is not correct in my view. You miss small flows which entirely fit
> > within your sampling rate. The smaller the flows the higher you sample,
> > the more you loose. To my knowledge, there is no simple and accurate way
> > to properly calculate or "guess" the number of flows. That's the reason
> > nfdump does not touch the number of flows. It does apply the sampling rate
> > to packets and bytes, as this is a reasonable approach. The question is
> > how close you come.
> >
> > Furthermore the shorter a flow is, the less accurate are it's bps and pps
> > as nfdump calculates those, based on the 'corrected estimated values'
> >
> > @Adrian: NfSen was never built to be a competing netflow product to
> > Arbor Peakflow :)
> > If we would do SNMP queries, this would basically give you an estimate
> > about the accuracy for overall packets and bytes. While this is not
> > impossible to implement in NfSen would it more help or more asking for
> > trouble? It would require you to configure your netflow settings exactly
> > for the interface in question, or you would need to make a per interface
> > evaluation.
> >
> > Any other suggestions? I'm open to for new ideas, to make nfdump of better
> > use, also for sampled flows.
> >
> >       - Peter
> >
> >
> > On 10/1/13 8:14 AM, Adrian Popa wrote:
> >> Depending on what you want to use the netflow data for, sampling could be
> >> low or high. Low sampling gives you more accurate data for a specific flow,
> >> high sampling can give you some average data for the whole box. The more
> >> details you want, the lower the sampling has to be.
> >>
> >> Keep in mind one more fact - newer versions of nfsen do "sampling
> >> correction" - meaning, it can detect the sampling rate (it's normally
> >> exported by the router), and adjusts the flow values according to this
> >> sample rate.
> >>
> >> In your case, I would say that nfsen received a flow record with only 2
> >> packets, with duration 16ms and based on sampling 1:1000 it "adjusted" it
> >> to 2000 packets. I'm pretty sure the traffic volume is adjusted as well.
> >> This can be misleading for small traffic values, but you can generally
> >> exclude these by filtering for flows with duration > 1000 (at least 1
> >> second).
> >>
> >> To disable this sampling correction, you would need to start your collector
> >> with -s -1 parameter (set sampling to 1), but your graphs would probably be
> >> 1000 times smaller in values. :)
> >>
> >> @Peter: I know sampling is tricky, but I've noticed an option in a
> >> competing netflow product - Arbor Peakflow - that can get better results.
> >> They also read the sampling values exported by the router, but make
> >> periodic SNMP queries to read the traffic values on exporting interfaces.
> >> They then try to see if the netflow traffic seen on router X, interface Y
> >> matches the SNMP traffic for the same router and interface. If the ratio is
> >> close to 1:1, then their sampling correction is ok. If it's offset, either
> >> they are not getting the whole netflow traffic for that interface, or they
> >> are not correcting it correctly. I think that in this case, they
> >> dynamically change the sampling rate in their corrections in order to make
> >> the two readings match...
> >>
> >> This would involve quite a few changes in nfsen, and would probably annoy
> >> router administrators (nobody wants yet another management app to read SNMP
> >> values from an overloaded router), but might be something worth considering
> >> in the future if this gets out of hand.
> >>
> >> Regards,
> >> Adrian
> >>
> >>
> >> On Thu, Jan 10, 2013 at 12:12 AM, Jason Lixfeld <
> >> jason-nfsen-disc...@lixfeld.ca> wrote:
> >>
> >>>
> >>> On 2013-01-07, at 2:03 AM, Adrian Popa <adrian.popa...@gmail.com> wrote:
> >>>
> >>>> If you are worried instead about the low volume of traffic seen from
> >>> this AS, keep in mind the following:
> >>>> 1. You are probably using sampling on your router. NFSEN accounts for
> >>> sampling and tries to guesstimate some of the values.
> >>>
> >>> I am sampling.  1:1000.
> >>>
> >>> Maybe I don't quite understand sampling.  Sampling doesn't quelch the
> >>> number of flow records exported to the collector, it quelches the number 
> >>> of
> >>> packets that are processed by the device in order to create the flow
> >>> record.  Is that accurate?
> >>>
> >>> So I just re-ran the math from the output below.  Let's take this one for
> >>> argument's sake:
> >>>
> >>> 2013-01-03 10:10:43.424     0.016 any               30513        2( 0.0)
> >>>  2000( 0.0)    3.0 M( 0.0)   125000    1.5 G  1500
> >>>
> >>> So what that is saying is that the statistic entry for AS30513 was first
> >>> seen on 2013-01-03 10:10:43.424, consists of 16ms worth of data where 2
> >>> flows totalling 3MB of data volume spread across 2000 packets was 
> >>> collected
> >>> within those 16ms.  The flow records have no knowledge of pps, bps or bpp,
> >>> so nfdump calculates those values based on the data that it knows about;
> >>> time (16ms), volume (3MB) and total number of packets based on the 
> >>> exported
> >>> flow records received by nfcapd.
> >>>
> >>> So if this is true, then trying to use bps as a statistic orderby will
> >>> never provide you with decent results because those values are calculated
> >>> based on data that might have been quelched based on the way the sampling
> >>> works.
> >>>
> >>> If this is correct, it seems to me like sampling is bad (but I can't
> >>> actually not sample or else my routers drop netflow packets; they can only
> >>> handle 100k across the entire box), but I understand why it exists.  So if
> >>> sampling is the root cause of all these "bad" calculations, it would stand
> >>> to reason that one should set the sampling rate as close to 1:1 as 
> >>> possible?
> >>>
> >>>> 2. You may have some spoofed traffic in your network that sends few
> >>> packets (hence the very short duration), but because of sampling, you get 
> >>> a
> >>> high count of packets (and usually this is a "round" number).
> >>>>
> >>>> On Sat, Jan 5, 2013 at 9:44 AM, Peter Haag <ph...@users.sourceforge.net>
> >>> wrote:
> >>>> Hi Jason,
> >>>> Looking at your output, I can not find something weird. Please keep in
> >>> mind:
> >>>> Each flow has two ASes, so and so see on how many flows these ASes
> >>> appear.
> >>>> Your second example makes it clear: You filter for 'as 30513' which
> >>> results
> >>>> in two flows - AS 30513 <-> AS 0. AS 0 means the exporting router has no
> >>> AS
> >>>> info. These resulting two flows are now ordered by AS and by bps as
> >>> requested.
> >>>> Each AS appears in each flow -> in 100% of all flows.
> >>>>
> >>>> The same math is now applied for your first run. But you only have the
> >>> flows
> >>>> of the first top 10 ASes by bps. In % the digits are way below what can
> >>> be
> >>>> displayed. You may also use -N to prevent scaling (K, M, G, T) in order
> >>> to
> >>>> see the actual number. To sum up, you would need to output of all seen
> >>> ASes
> >>>> -n 0 .
> >>>>
> >>>> Hope, this helps, otherwise let me know, if I can help
> >>>>
> >>>>        - Peter
> >>>>
> >>>> On 4/1/13 5:20 PM, Jason Lixfeld wrote:
> >>>>> Hi there,
> >>>>>
> >>>>> So I'm just playing around with my first 36 hours worth of data and
> >>> I'm seeing some stuff that looks sort of off:
> >>>>>
> >>>>> ** nfdump -M
> >>> /opt/nfsen/profiles-data/live/bfr01-hudson:bfr01-mowat:bfr01-front  -T  -R
> >>> 2013/01/02/nfcapd.201301022305:2013/01/04/nfcapd.201301041055 -n 10 -s
> >>> as/bps
> >>>>> nfdump filter:
> >>>>> any
> >>>>> Top 10 AS ordered by bps:
> >>>>> Date first seen          Duration Proto                AS    Flows(%)
> >>>    Packets(%)       Bytes(%)         pps      bps   bpp
> >>>>> 2013-01-02 22:39:46.290 130797.681 any                   0   21.1
> >>> M(85.9)   42.2 G(87.5)   30.0 T(88.5)   322585    1.8 G   710
> >>>>> 2013-01-03 10:10:43.424     0.016 any               30513        2(
> >>> 0.0)     2000( 0.0)    3.0 M( 0.0)   125000    1.5 G  1500
> >>>>> 2013-01-03 08:53:20.734     0.015 any               37957        2(
> >>> 0.0)     2000( 0.0)    1.5 M( 0.0)   133333  810.7 M   760
> >>>>> 2013-01-04 10:23:02.606     0.017 any               35414        2(
> >>> 0.0)     2000( 0.0)    1.5 M( 0.0)   117647  727.5 M   773
> >>>>> 2013-01-03 14:25:51.067     0.017 any               33428        2(
> >>> 0.0)     2000( 0.0)    1.5 M( 0.0)   117647  692.7 M   736
> >>>>> 2013-01-03 13:37:35.176     0.039 any               46676        1(
> >>> 0.0)     2000( 0.0)    2.8 M( 0.0)    51282  582.6 M  1420
> >>>>> 2013-01-04 00:43:04.529     0.048 any               15347        1(
> >>> 0.0)     2000( 0.0)    2.8 M( 0.0)    41666  473.3 M  1420
> >>>>> 2013-01-03 15:58:33.535     0.077 any               47045        1(
> >>> 0.0)     3000( 0.0)    4.3 M( 0.0)    38961  442.6 M  1420
> >>>>> 2013-01-02 23:02:16.952 129445.016 any               22822    4.0
> >>> M(16.2)    8.9 G(18.5)    6.4 T(19.0)    68835  398.2 M   723
> >>>>> 2013-01-03 14:52:54.865     0.031 any               19354        2(
> >>> 0.0)     2000( 0.0)    1.5 M( 0.0)    64516  379.9 M   736
> >>>>>
> >>>>> Summary: total flows: 24583165, total bytes: 33.9 T, total packets:
> >>> 48.2 G, avg bps: 2.1 G, avg pps: 368688, avg bpp: 702
> >>>>> Time window: 2013-01-02 22:39:34 - 2013-01-04 10:59:43
> >>>>> Total flows processed: 24583165, Blocks skipped: 0, Bytes read:
> >>> 2261849088
> >>>>> Sys: 8.970s flows/second: 2740403.8  Wall: 10.563s flows/second:
> >>> 2327242.5
> >>>>>
> >>>>> Lines 1 and 9 seem OK, but lines 2-8,10 look really weird; the math
> >>> just doesn't add up.
> >>>>>
> >>>>> If I filter specifically on AS 30513:
> >>>>>
> >>>>> ** nfdump -M
> >>> /opt/nfsen/profiles-data/live/bfr01-hudson:bfr01-mowat:bfr01-front  -T  -R
> >>> 2013/01/02/nfcapd.201301022305:2013/01/04/nfcapd.201301041055 -n 10 -s
> >>> as/bps
> >>>>> nfdump filter:
> >>>>> AS 30513
> >>>>> Top 10 AS ordered by bps:
> >>>>> Date first seen          Duration Proto                AS    Flows(%)
> >>>    Packets(%)       Bytes(%)         pps      bps   bpp
> >>>>> 2013-01-03 10:10:43.424     0.016 any                   0
> >>> 2(100.0)     2000(100.0)    3.0 M(100.0)   125000    1.5 G  1500
> >>>>> 2013-01-03 10:10:43.424     0.016 any               30513
> >>> 2(100.0)     2000(100.0)    3.0 M(100.0)   125000    1.5 G  1500
> >>>>>
> >>>>> Summary: total flows: 2, total bytes: 3.0 M, total packets: 2000, avg
> >>> bps: 1.5 G, avg pps: 125000, avg bpp: 1500
> >>>>> Time window: 2013-01-02 22:39:34 - 2013-01-04 10:59:43
> >>>>> Total flows processed: 24583165, Blocks skipped: 0, Bytes read:
> >>> 2261849088
> >>>>> Sys: 7.574s flows/second: 3245367.9  Wall: 8.594s flows/second:
> >>> 2860278.3
> >>>>>
> >>>>> I have no idea how to even begin going about troubleshooting this, so
> >>> any thoughts are welcomed.
> >>>>>
> >>>>> Thanks again in advance.
> >>>>>
> >>> ------------------------------------------------------------------------------
> >>>>> Master HTML5, CSS3, ASP.NET, MVC, AJAX, Knockout.js, Web API and
> >>>>> much more. Get web development skills now with LearnDevNow -
> >>>>> 350+ hours of step-by-step video tutorials by Microsoft MVPs and
> >>> experts.
> >>>>> SALE $99.99 this month only -- learn more at:
> >>>>> http://p.sf.net/sfu/learnmore_122812
> >>>>> _______________________________________________
> >>>>> Nfsen-discuss mailing list
> >>>>> Nfsen-discuss@lists.sourceforge.net
> >>>>> https://lists.sourceforge.net/lists/listinfo/nfsen-discuss
> >>>>>
> >>>>
> >>>> --
> >>>> Be nice to your netflow data. Use NfSen and nfdump :)
> >>>>
> >>>>
> >>> ------------------------------------------------------------------------------
> >>>> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
> >>>> MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
> >>>> with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
> >>>> MVPs and experts. SALE $99.99 this month only -- learn more at:
> >>>> http://p.sf.net/sfu/learnmore_122912
> >>>> _______________________________________________
> >>>> Nfsen-discuss mailing list
> >>>> Nfsen-discuss@lists.sourceforge.net
> >>>> https://lists.sourceforge.net/lists/listinfo/nfsen-discuss
> >>>>
> >>>
> >>>
> >>
> >
> > --
> > Be nice to your netflow data. Use NfSen and nfdump :)
> 
> 


------------------------------------------------------------------------------
Master HTML5, CSS3, ASP.NET, MVC, AJAX, Knockout.js, Web API and
much more. Get web development skills now with LearnDevNow -
350+ hours of step-by-step video tutorials by Microsoft MVPs and experts.
SALE $99.99 this month only -- learn more at:
http://p.sf.net/sfu/learnmore_122812
_______________________________________________
Nfsen-discuss mailing list
Nfsen-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfsen-discuss

Re: [Nfsen-discuss] Troubleshooting some odd looking nfdump output

Reply via email to