Depends on the operation. In your case, you could easily split the flow-filter 
operations across multiple cores.

./flow-cat /var/netflow/ft/ft-v05.2010-09-29.00* 
/var/netflow/ft/ft-v05.2010-09-29.01* | ./flow-filter -Ssource -P25 > /tmp/foo1
./flow-cat /var/netflow/ft/ft-v05.2010-09-29.02* 
/var/netflow/ft/ft-v05.2010-09-29.03* | ./flow-filter -Ssource -P25 > /tmp/foo2
./flow-cat /var/netflow/ft/ft-v05.2010-09-29.04* 
/var/netflow/ft/ft-v05.2010-09-29.05* | ./flow-filter -Ssource -P25 > /tmp/foo3
etc.

then 

./flow-cat /tmp/foo* ./flow-stat -f9 -S3 | more

This hints at the achilles heel of netflow v5 data -- there's simply too much 
data to do arbitrary queries quickly across a large timeframe. If you have an 
idea of what you will be querying on (e.g., looking for SMTP usage), you can 
easily prefilter. I think that's how most of us deal with canned queries like 
SMTP usage, abuse reports, and peering load. But if you don't know what the 
query will be in advance, you need a layer of indexing. Much work has been done 
on this topic, but little of it scales.

Adventures of putting netflow in SQL: 
http://paintsquirrel.ucs.indiana.edu/pdf/netflow_hawaii.pdf

Survey of numerous Netflow indexing systems: 
http://www.cs.karelia.ru/fdpw/2007/sherikov/sherikov.pdf

I can't find it now on google, but a year or so ago I read about an hash index 
system. Essentially, you create one index for the hashed version of each major 
netflow field. The index field tracks which raw flow files have flow data 
matching the hash. When you do a query like "src.ip=x.x.x.x dst.ip=y.y.y.y 
dst.port=80", you first process the indices to generate a list of which raw 
flow files may have matches, and then run flow-filter on just those files. This 
can be a big time saver, and this mechanism is fully compatible with 
flow-tools. Of course, if your search criteria will be in every flow file, then 
this won't help one bit.

-Craig

________________________________________
From: [email protected] 
[[email protected]] On Behalf Of Drew Weaver 
[[email protected]]
Sent: Wednesday, September 29, 2010 1:44 PM
To: [email protected]
Subject: [Flow-tools] Speeding up the flow-cat process

Has anyone done any work on speeding up the flow-cat process?

./flow-cat /var/netflow/ft/ft-v05.2010-09-29* | ./flow-filter -Ssource -P25 | 
./flow-stat -f9 -S3 | more

A command like this takes about 20 minutes to run on my box which is a Xeon 
3360, it looks like only one Core is being used by flow-cat though.

-Drew

_______________________________________________
Flow-tools mailing list
[email protected]
http://mailman.splintered.net/mailman/listinfo/flow-tools

Reply via email to