[Flow-tools] RE: Speeding up the flow-cat process

Andrew O'Brien Wed, 29 Sep 2010 17:49:46 -0700

Hi Craig, Drew,

Firstly I apologise for referring to what might seem to be a "competitor" but 
I've found that I use this tool and flow-tools in tandem quite frequently.


> But if you don't know what the query will be in advance, you need
> a layer of indexing. Much work has been done on this topic, but little
> of it scales.
> 
> Adventures of putting netflow in SQL:
> http://paintsquirrel.ucs.indiana.edu/pdf/netflow_hawaii.pdf
> 
> Survey of numerous Netflow indexing systems:
> http://www.cs.karelia.ru/fdpw/2007/sherikov/sherikov.pdf

At $work we use the Silk tools from NetSA for storage and searching/reporting 
of flow data that we ingest from various remote sources. Some of those sources 
are flow-tools files, some v5 PDUs, some raw pcap, some IPFIX so this fit the 
bill nicely. Its all unidirectional though from vague memory.

The storage is slightly indexed and optionally compressed but you still deal 
with the data in a similar way to the flow-tools suite:

 1) Filter out raw data sets by date/traffic/port/AS/whatever
 2) pipe to various aggregation tools (or save to file if you need to do 
multiple analyses on the same data set)
 3) Save aggregated data to another data store (in our case a DB) for 
presentation

While not a silver bullet it helped us with storage and ease of pre-calculated 
reporting. All the usual warnings about trading IO for CPU apply.

Some stats from one particular installation:

 - approx 1 Billion flows/day
 - 8 days takes approx 90G space vs approx 140G for flow-tools files (we keep 
both for about a week)

Hopefully I haven't stepped on too many toes here.

Cheers,

Andrew
_______________________________________________
Flow-tools mailing list
[email protected]
http://mailman.splintered.net/mailman/listinfo/flow-tools

[Flow-tools] RE: Speeding up the flow-cat process

Reply via email to