Hi Brian, This time-wrap is a kind of hassle since ever. It actually has been tested on many exporters - ok mainly on Cisco's, as it seemed to be used widely.
I haven't looked into softflowd overruns, as I expected them too behave the same. I am open to help debugging the stuff, you have me a pcap with an overflow, which could be rather difficult to produce. Alternatively, you could add some LogInfo() messages in case of a wrap around to log the original values to see how compensation needs to be done correctly or even better to fix softflowd. Cheers - Peter On 02.06.16 16:51, Brian Candler wrote: > In one site, I am generating flow records from a pfSense (2.3) box > running softflowd (1.2.1). > > On the collector, nfdump (1.6.13) is compiled with --enable-nsel because > there is also an ASA generating flows. > > Looking at the pfSense flows, I am seeing some strange summary records > where the duration looks to be a 2^32 wraparound (and hence the bps > values are completely wrong). > > $ nfdump -M /var/nfsen/profiles-data/live/lch-fw1 -T -r > 2016/06/02/nfcapd.201606020300 -c 10 -s proto/flows 'src host 10.26.1.30 > and dst host xxx.xx.xx.1' > Top 10 Protocol ordered by flows: > Date first seen Duration Proto Protocol Flows(%) > Packets(%) Bytes(%) pps bps bpp > 2016-04-13 09:57:55.236 4294010.415 ICMP 1 1(100.0) > 24(100.0) 2016(100.0) 0 0 84 > > Summary: total flows: 1, total bytes: 2016, total packets: 24, avg bps: > 0, avg pps: 0, avg bpp: 84 > Time window: 2016-04-13 09:38:02 - 2016-06-02 03:02:51 > Total flows processed: 5373, Blocks skipped: 0, Bytes read: 349768 > Sys: 0.004s flows/second: 1343250.0 Wall: 0.003s flows/second: 1704632.0 > > Note that (4294010.415 * 1000) = 0xfff1662f which is suspiciously near > to 2^32. > > The pfsense box generating these records has been up for 38 days, which > means it booted on 23rd April. Hence the "Date first seen" of 13th April > for the flow record doesn't make much sense. > > I purposedly chose a summary over a single flow for the example above. > Looking at the flow itself: > > $ nfdump -M /var/nfsen/profiles-data/live/lch-fw1 -T -r > 2016/06/02/nfcapd.201606020300 -c 10 -o raw 'src host 10.26.1.30 and dst > host xxx.xx.xx.1' > > Flow Record: > Flags = 0x06 FLOW, Unsampled > export sysid = 1 > size = 56 > first = 1460537875 [2016-04-13 09:57:55] > last = 1464831885 [2016-06-02 02:44:45] > msec_first = 236 > msec_last = 651 > src addr = 10.26.1.30 > dst addr = xxx.xx.xx.1 > ICMP = 8.0 type.code > fwd status = 0 > tcp flags = 0x00 ...... > proto = 1 ICMP > (src)tos = 0 > (in)packets = 24 > (in)bytes = 2016 > > Summary: total flows: 1, total bytes: 2016, total packets: 24, avg bps: > 0, avg pps: 0, avg bpp: 84 > Time window: 2016-04-13 09:38:02 - 2016-06-02 03:02:51 > Total flows processed: 5373, Blocks skipped: 0, Bytes read: 349768 > Sys: 0.004s flows/second: 1343250.0 Wall: 0.004s flows/second: 1295323.0 > > The "first" timestamp appears to be April 13th. > > If I simply add (2^32 / 1000) to the start time, I get: > > new first = 1464832842 > > last = 1464831885 > > ...but this gives a "first" time which is now 957 seconds *after* the > last time :-( > > Looking at the source code I found this (calculations are in > milliseconds at this point): > > uint32_t First = data_record->first; > uint32_t Last = data_record->last; > > if ( First > Last ) > /* First in msec, in case of msec > overflow, between start and end */ > start_time = exporter->boot_time - > 0x100000000LL + (uint64_t)First; > else > start_time = (uint64_t)First + > exporter->boot_time; > > /* end time in msecs */ > end_time = (uint64_t)Last + exporter->boot_time; > > if ( (end_time - start_time) > 0xffc00000 && > table->bytes < 2000 ) { > dbg_printf("CISCO bugfix!\n"); > start_time += 0xffc00000; > } > > There's certainly a subtraction of 2^32 possible there; this implies > that softflowd is giving a flow with First > Last. > > I don't think the 0xffc00000 branch applies here: firstly it would bring > the start_time forward not back, and secondly the (in) bytes count is > greater than 2000. > > However, adding (0xffc00000 / 1000) to the start time, I get a value > which is about 54 minutes before the finish time, which is vaguely > plausible. > > Any suggestions for what is happening here, or how I might go about > debugging it? Or has anyone else seen issues like this with softflowd > and flow start/end timestamps? > > Thanks, > > Brian. > > ------------------------------------------------------------------------------ > What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic > patterns at an interface-level. Reveals which users, apps, and protocols are > consuming the most bandwidth. Provides multi-vendor support for NetFlow, > J-Flow, sFlow and other flows. Make informed decisions using capacity > planning reports. https://ad.doubleclick.net/ddm/clk/305295220;132659582;e > _______________________________________________ > Nfdump-discuss mailing list > Nfdump-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/nfdump-discuss > -- Be nice to your netflow data. Use NfSen and nfdump :) ------------------------------------------------------------------------------ What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic patterns at an interface-level. Reveals which users, apps, and protocols are consuming the most bandwidth. Provides multi-vendor support for NetFlow, J-Flow, sFlow and other flows. Make informed decisions using capacity planning reports. https://ad.doubleclick.net/ddm/clk/305295220;132659582;e _______________________________________________ Nfdump-discuss mailing list Nfdump-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfdump-discuss