Depends on the operation. In your case, you could easily split the flow-filter operations across multiple cores.
./flow-cat /var/netflow/ft/ft-v05.2010-09-29.00* /var/netflow/ft/ft-v05.2010-09-29.01* | ./flow-filter -Ssource -P25 > /tmp/foo1 ./flow-cat /var/netflow/ft/ft-v05.2010-09-29.02* /var/netflow/ft/ft-v05.2010-09-29.03* | ./flow-filter -Ssource -P25 > /tmp/foo2 ./flow-cat /var/netflow/ft/ft-v05.2010-09-29.04* /var/netflow/ft/ft-v05.2010-09-29.05* | ./flow-filter -Ssource -P25 > /tmp/foo3 etc. then ./flow-cat /tmp/foo* ./flow-stat -f9 -S3 | more This hints at the achilles heel of netflow v5 data -- there's simply too much data to do arbitrary queries quickly across a large timeframe. If you have an idea of what you will be querying on (e.g., looking for SMTP usage), you can easily prefilter. I think that's how most of us deal with canned queries like SMTP usage, abuse reports, and peering load. But if you don't know what the query will be in advance, you need a layer of indexing. Much work has been done on this topic, but little of it scales. Adventures of putting netflow in SQL: http://paintsquirrel.ucs.indiana.edu/pdf/netflow_hawaii.pdf Survey of numerous Netflow indexing systems: http://www.cs.karelia.ru/fdpw/2007/sherikov/sherikov.pdf I can't find it now on google, but a year or so ago I read about an hash index system. Essentially, you create one index for the hashed version of each major netflow field. The index field tracks which raw flow files have flow data matching the hash. When you do a query like "src.ip=x.x.x.x dst.ip=y.y.y.y dst.port=80", you first process the indices to generate a list of which raw flow files may have matches, and then run flow-filter on just those files. This can be a big time saver, and this mechanism is fully compatible with flow-tools. Of course, if your search criteria will be in every flow file, then this won't help one bit. -Craig ________________________________________ From: [email protected] [[email protected]] On Behalf Of Drew Weaver [[email protected]] Sent: Wednesday, September 29, 2010 1:44 PM To: [email protected] Subject: [Flow-tools] Speeding up the flow-cat process Has anyone done any work on speeding up the flow-cat process? ./flow-cat /var/netflow/ft/ft-v05.2010-09-29* | ./flow-filter -Ssource -P25 | ./flow-stat -f9 -S3 | more A command like this takes about 20 minutes to run on my box which is a Xeon 3360, it looks like only one Core is being used by flow-cat though. -Drew _______________________________________________ Flow-tools mailing list [email protected] http://mailman.splintered.net/mailman/listinfo/flow-tools
