Thanks all for your replies.  So I guess for what I'm looking for, dropping the 
sample rate to the point where it just hovers on the HW limits of my platform 
is probably where I want to be.

So hypothetically speaking, what's the difference between sampling and not 
sampling and letting the router drop flows if it bumps up against it's hardware 
limit?  I guess that depends on how far over the HW limit your flows go - that 
is, if you are dropping 5% of your flows your numbers might be kinda messy, but 
if you are dropping .002% of your flows, that might not be a bad compromise to 
1:1 sampling?

On 2013-01-10, at 3:05 PM, Peter Haag <ph...@users.sourceforge.net> wrote:

> Sampling is indeed tricky and Adrian explained the facts pretty good.
> 
> Just to make a few additions and remarks, also to the mail from Jason.
> This is btw. to the best of my knowledge, and I'm glad for any other
> input which may help.
> 
> @Jason: "Sampling doesn't quelch the number of flow records"
> This is not correct in my view. You miss small flows which entirely fit
> within your sampling rate. The smaller the flows the higher you sample,
> the more you loose. To my knowledge, there is no simple and accurate way
> to properly calculate or "guess" the number of flows. That's the reason
> nfdump does not touch the number of flows. It does apply the sampling rate
> to packets and bytes, as this is a reasonable approach. The question is
> how close you come.
> 
> Furthermore the shorter a flow is, the less accurate are it's bps and pps
> as nfdump calculates those, based on the 'corrected estimated values'
> 
> @Adrian: NfSen was never built to be a competing netflow product to
> Arbor Peakflow :)
> If we would do SNMP queries, this would basically give you an estimate
> about the accuracy for overall packets and bytes. While this is not
> impossible to implement in NfSen would it more help or more asking for
> trouble? It would require you to configure your netflow settings exactly
> for the interface in question, or you would need to make a per interface
> evaluation.
> 
> Any other suggestions? I'm open to for new ideas, to make nfdump of better
> use, also for sampled flows.
> 
>       - Peter
> 
> 
> On 10/1/13 8:14 AM, Adrian Popa wrote:
>> Depending on what you want to use the netflow data for, sampling could be
>> low or high. Low sampling gives you more accurate data for a specific flow,
>> high sampling can give you some average data for the whole box. The more
>> details you want, the lower the sampling has to be.
>> 
>> Keep in mind one more fact - newer versions of nfsen do "sampling
>> correction" - meaning, it can detect the sampling rate (it's normally
>> exported by the router), and adjusts the flow values according to this
>> sample rate.
>> 
>> In your case, I would say that nfsen received a flow record with only 2
>> packets, with duration 16ms and based on sampling 1:1000 it "adjusted" it
>> to 2000 packets. I'm pretty sure the traffic volume is adjusted as well.
>> This can be misleading for small traffic values, but you can generally
>> exclude these by filtering for flows with duration > 1000 (at least 1
>> second).
>> 
>> To disable this sampling correction, you would need to start your collector
>> with -s -1 parameter (set sampling to 1), but your graphs would probably be
>> 1000 times smaller in values. :)
>> 
>> @Peter: I know sampling is tricky, but I've noticed an option in a
>> competing netflow product - Arbor Peakflow - that can get better results.
>> They also read the sampling values exported by the router, but make
>> periodic SNMP queries to read the traffic values on exporting interfaces.
>> They then try to see if the netflow traffic seen on router X, interface Y
>> matches the SNMP traffic for the same router and interface. If the ratio is
>> close to 1:1, then their sampling correction is ok. If it's offset, either
>> they are not getting the whole netflow traffic for that interface, or they
>> are not correcting it correctly. I think that in this case, they
>> dynamically change the sampling rate in their corrections in order to make
>> the two readings match...
>> 
>> This would involve quite a few changes in nfsen, and would probably annoy
>> router administrators (nobody wants yet another management app to read SNMP
>> values from an overloaded router), but might be something worth considering
>> in the future if this gets out of hand.
>> 
>> Regards,
>> Adrian
>> 
>> 
>> On Thu, Jan 10, 2013 at 12:12 AM, Jason Lixfeld <
>> jason-nfsen-disc...@lixfeld.ca> wrote:
>> 
>>> 
>>> On 2013-01-07, at 2:03 AM, Adrian Popa <adrian.popa...@gmail.com> wrote:
>>> 
>>>> If you are worried instead about the low volume of traffic seen from
>>> this AS, keep in mind the following:
>>>> 1. You are probably using sampling on your router. NFSEN accounts for
>>> sampling and tries to guesstimate some of the values.
>>> 
>>> I am sampling.  1:1000.
>>> 
>>> Maybe I don't quite understand sampling.  Sampling doesn't quelch the
>>> number of flow records exported to the collector, it quelches the number of
>>> packets that are processed by the device in order to create the flow
>>> record.  Is that accurate?
>>> 
>>> So I just re-ran the math from the output below.  Let's take this one for
>>> argument's sake:
>>> 
>>> 2013-01-03 10:10:43.424     0.016 any               30513        2( 0.0)
>>>  2000( 0.0)    3.0 M( 0.0)   125000    1.5 G  1500
>>> 
>>> So what that is saying is that the statistic entry for AS30513 was first
>>> seen on 2013-01-03 10:10:43.424, consists of 16ms worth of data where 2
>>> flows totalling 3MB of data volume spread across 2000 packets was collected
>>> within those 16ms.  The flow records have no knowledge of pps, bps or bpp,
>>> so nfdump calculates those values based on the data that it knows about;
>>> time (16ms), volume (3MB) and total number of packets based on the exported
>>> flow records received by nfcapd.
>>> 
>>> So if this is true, then trying to use bps as a statistic orderby will
>>> never provide you with decent results because those values are calculated
>>> based on data that might have been quelched based on the way the sampling
>>> works.
>>> 
>>> If this is correct, it seems to me like sampling is bad (but I can't
>>> actually not sample or else my routers drop netflow packets; they can only
>>> handle 100k across the entire box), but I understand why it exists.  So if
>>> sampling is the root cause of all these "bad" calculations, it would stand
>>> to reason that one should set the sampling rate as close to 1:1 as possible?
>>> 
>>>> 2. You may have some spoofed traffic in your network that sends few
>>> packets (hence the very short duration), but because of sampling, you get a
>>> high count of packets (and usually this is a "round" number).
>>>> 
>>>> On Sat, Jan 5, 2013 at 9:44 AM, Peter Haag <ph...@users.sourceforge.net>
>>> wrote:
>>>> Hi Jason,
>>>> Looking at your output, I can not find something weird. Please keep in
>>> mind:
>>>> Each flow has two ASes, so and so see on how many flows these ASes
>>> appear.
>>>> Your second example makes it clear: You filter for 'as 30513' which
>>> results
>>>> in two flows - AS 30513 <-> AS 0. AS 0 means the exporting router has no
>>> AS
>>>> info. These resulting two flows are now ordered by AS and by bps as
>>> requested.
>>>> Each AS appears in each flow -> in 100% of all flows.
>>>> 
>>>> The same math is now applied for your first run. But you only have the
>>> flows
>>>> of the first top 10 ASes by bps. In % the digits are way below what can
>>> be
>>>> displayed. You may also use -N to prevent scaling (K, M, G, T) in order
>>> to
>>>> see the actual number. To sum up, you would need to output of all seen
>>> ASes
>>>> -n 0 .
>>>> 
>>>> Hope, this helps, otherwise let me know, if I can help
>>>> 
>>>>        - Peter
>>>> 
>>>> On 4/1/13 5:20 PM, Jason Lixfeld wrote:
>>>>> Hi there,
>>>>> 
>>>>> So I'm just playing around with my first 36 hours worth of data and
>>> I'm seeing some stuff that looks sort of off:
>>>>> 
>>>>> ** nfdump -M
>>> /opt/nfsen/profiles-data/live/bfr01-hudson:bfr01-mowat:bfr01-front  -T  -R
>>> 2013/01/02/nfcapd.201301022305:2013/01/04/nfcapd.201301041055 -n 10 -s
>>> as/bps
>>>>> nfdump filter:
>>>>> any
>>>>> Top 10 AS ordered by bps:
>>>>> Date first seen          Duration Proto                AS    Flows(%)
>>>    Packets(%)       Bytes(%)         pps      bps   bpp
>>>>> 2013-01-02 22:39:46.290 130797.681 any                   0   21.1
>>> M(85.9)   42.2 G(87.5)   30.0 T(88.5)   322585    1.8 G   710
>>>>> 2013-01-03 10:10:43.424     0.016 any               30513        2(
>>> 0.0)     2000( 0.0)    3.0 M( 0.0)   125000    1.5 G  1500
>>>>> 2013-01-03 08:53:20.734     0.015 any               37957        2(
>>> 0.0)     2000( 0.0)    1.5 M( 0.0)   133333  810.7 M   760
>>>>> 2013-01-04 10:23:02.606     0.017 any               35414        2(
>>> 0.0)     2000( 0.0)    1.5 M( 0.0)   117647  727.5 M   773
>>>>> 2013-01-03 14:25:51.067     0.017 any               33428        2(
>>> 0.0)     2000( 0.0)    1.5 M( 0.0)   117647  692.7 M   736
>>>>> 2013-01-03 13:37:35.176     0.039 any               46676        1(
>>> 0.0)     2000( 0.0)    2.8 M( 0.0)    51282  582.6 M  1420
>>>>> 2013-01-04 00:43:04.529     0.048 any               15347        1(
>>> 0.0)     2000( 0.0)    2.8 M( 0.0)    41666  473.3 M  1420
>>>>> 2013-01-03 15:58:33.535     0.077 any               47045        1(
>>> 0.0)     3000( 0.0)    4.3 M( 0.0)    38961  442.6 M  1420
>>>>> 2013-01-02 23:02:16.952 129445.016 any               22822    4.0
>>> M(16.2)    8.9 G(18.5)    6.4 T(19.0)    68835  398.2 M   723
>>>>> 2013-01-03 14:52:54.865     0.031 any               19354        2(
>>> 0.0)     2000( 0.0)    1.5 M( 0.0)    64516  379.9 M   736
>>>>> 
>>>>> Summary: total flows: 24583165, total bytes: 33.9 T, total packets:
>>> 48.2 G, avg bps: 2.1 G, avg pps: 368688, avg bpp: 702
>>>>> Time window: 2013-01-02 22:39:34 - 2013-01-04 10:59:43
>>>>> Total flows processed: 24583165, Blocks skipped: 0, Bytes read:
>>> 2261849088
>>>>> Sys: 8.970s flows/second: 2740403.8  Wall: 10.563s flows/second:
>>> 2327242.5
>>>>> 
>>>>> Lines 1 and 9 seem OK, but lines 2-8,10 look really weird; the math
>>> just doesn't add up.
>>>>> 
>>>>> If I filter specifically on AS 30513:
>>>>> 
>>>>> ** nfdump -M
>>> /opt/nfsen/profiles-data/live/bfr01-hudson:bfr01-mowat:bfr01-front  -T  -R
>>> 2013/01/02/nfcapd.201301022305:2013/01/04/nfcapd.201301041055 -n 10 -s
>>> as/bps
>>>>> nfdump filter:
>>>>> AS 30513
>>>>> Top 10 AS ordered by bps:
>>>>> Date first seen          Duration Proto                AS    Flows(%)
>>>    Packets(%)       Bytes(%)         pps      bps   bpp
>>>>> 2013-01-03 10:10:43.424     0.016 any                   0
>>> 2(100.0)     2000(100.0)    3.0 M(100.0)   125000    1.5 G  1500
>>>>> 2013-01-03 10:10:43.424     0.016 any               30513
>>> 2(100.0)     2000(100.0)    3.0 M(100.0)   125000    1.5 G  1500
>>>>> 
>>>>> Summary: total flows: 2, total bytes: 3.0 M, total packets: 2000, avg
>>> bps: 1.5 G, avg pps: 125000, avg bpp: 1500
>>>>> Time window: 2013-01-02 22:39:34 - 2013-01-04 10:59:43
>>>>> Total flows processed: 24583165, Blocks skipped: 0, Bytes read:
>>> 2261849088
>>>>> Sys: 7.574s flows/second: 3245367.9  Wall: 8.594s flows/second:
>>> 2860278.3
>>>>> 
>>>>> I have no idea how to even begin going about troubleshooting this, so
>>> any thoughts are welcomed.
>>>>> 
>>>>> Thanks again in advance.
>>>>> 
>>> ------------------------------------------------------------------------------
>>>>> Master HTML5, CSS3, ASP.NET, MVC, AJAX, Knockout.js, Web API and
>>>>> much more. Get web development skills now with LearnDevNow -
>>>>> 350+ hours of step-by-step video tutorials by Microsoft MVPs and
>>> experts.
>>>>> SALE $99.99 this month only -- learn more at:
>>>>> http://p.sf.net/sfu/learnmore_122812
>>>>> _______________________________________________
>>>>> Nfsen-discuss mailing list
>>>>> Nfsen-discuss@lists.sourceforge.net
>>>>> https://lists.sourceforge.net/lists/listinfo/nfsen-discuss
>>>>> 
>>>> 
>>>> --
>>>> Be nice to your netflow data. Use NfSen and nfdump :)
>>>> 
>>>> 
>>> ------------------------------------------------------------------------------
>>>> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
>>>> MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
>>>> with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
>>>> MVPs and experts. SALE $99.99 this month only -- learn more at:
>>>> http://p.sf.net/sfu/learnmore_122912
>>>> _______________________________________________
>>>> Nfsen-discuss mailing list
>>>> Nfsen-discuss@lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/nfsen-discuss
>>>> 
>>> 
>>> 
>> 
> 
> -- 
> Be nice to your netflow data. Use NfSen and nfdump :)


------------------------------------------------------------------------------
Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
MVPs and experts. ON SALE this month only -- learn more at:
http://p.sf.net/sfu/learnmore_122712
_______________________________________________
Nfsen-discuss mailing list
Nfsen-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfsen-discuss

Reply via email to