xlate.cfg:
# really does nothing xlate-action MULTICAST-PRIVACY type ip-address-privacy-mask mask 0xFFFFFFFF 0xFFFFFFFF
# lop off the lower 11 bits of the source and destination address. xlate-action UNICAST-PRIVACY type ip-address-privacy-mask mask 0xFFFFFF00 0xFFFFF800
xlate-definition abilene_privacy
term
filter MCAST
action MULTICAST-PRIVACY
stop
# could change this to not anonymize say the root nameservers next time they get attacked on the fly.
term
filter UCAST
action UNICAST-PRIVACY
filter.cfg:
filter-primitive MCAST type ip-address-mask permit 224.0.0.0 240.0.0.0
filter-primitive UCAST type ip-address-mask deny 224.0.0.0 240.0.0.0 default permit
This can be done in-line by flow-capture, but not flow-fanout. It wouldn't be too much work to add to flow-fanout.
Dropping in code for a new xlate-action "ip-address-anonymize" and adding a few lines to flow-fanout to call the xlate code would probably be the easiest route.
Does the research collector require the data in Cisco's v5 format? flow-send will work for this, but it's UDP and therefore not very reliable. I prefer to use rsync to move the output from collectors to processing hosts. A script runs every 60 minutes or so which grabs the current day files from the collectors then a nightly script does the same thing again with the checksum option turned on.
-- mark
On Jul 2, 2004, at 3:32 PM, John Kristoff wrote:
I am constructing a collector that will receive a significant volume of
non-sampled flows. The primary goal is to anonymize the IP addresses in
the flows, but preserve a one-to-one relationship for each /32 address
and then deliver these anonymized flows to a research collector. Yes,
I realize this results in something that is not completely anonymous.
Nevertheless, I'm pondering how best to perform this process in order to turn around the anonymized flows to research collector as quickly as possible with the hardware available. It seems there are two basic approaches. One is to pipe received flows to an anonymization process, which in turn sends it output to flow-send via another pipe.
Another alternative is to fully capture the flows, run the anonymization
process on disk stored flows, which then pipes the output to flow-send.
The latter option would seem to result in a large I/O penalty, but I
wonder if it offers an advantage in reliability for flow delivery to the
research collector.
Note that non-anonymized flows will be captured to disk on the initial collector for non-research storage and analysis purposes anyway, so disk writes are already going to be done. Some periodic disk reads will be done using other utilities like those in the flow-tools package or FlowScan. I'd like to avoid having to add additional hardware for handling the anonymization or local storage/analysis processes. The collector receiving the initial flows is a recent Intel dual processor box with 12 GB of RAM and a few hundred gigabytes of available disk.
I envision running a flow-fanout process that hands one copy of the flows to the flow-capture process for non-research purposes locally and hands another copy to flow-receive, which sends to an anonymization process, which in turn sends directly to flow-send for delivery to the research collector. That seems like the easiest and most scaleable approach with what I have to work with.
Thoughts about this or suggestions for a design that is as robust as can be?
John _______________________________________________ Flow-tools mailing list [EMAIL PROTECTED] http://mailman.splintered.net/mailman/listinfo/flow-tools
_______________________________________________ Flow-tools mailing list [EMAIL PROTECTED] http://mailman.splintered.net/mailman/listinfo/flow-tools
