On Jun 8, 2005, at 15:19, [EMAIL PROTECTED] wrote:
I'd like to come up to speed on the state of the
art in de-identification (~=anonymization) of data
especially monitoring data (firewall/hids logs, say).
I don't know the state of the art, but I can tell you the state of the
artless. I had a request to share ourr border router traffic logs
(Cisco netflow) with a university, so they could try out some anomaly
detection schemes they were working on.
(Bkgnd: We don't consider our network topology sensitive. Our traffic
logs are subject to a general respect for privacy.)
Since they could send us packets of their choosing, I deemed it useless
to obfuscate our own IP addresses. I chose to anonymize all the
external addresses. My design note is below.
But then, as fate would have it, the university said they needed the
true external addresses. That left me a bit stumped. Perhaps a less
chaotic mapping, like one that is bijective between classful network
numbers, would do.
============================
obfuscation filter program
Parameters
Blocks of IP addresses deemed internal. Internal includes multicast
addresses and RFC 1918 "private use" address.
Working data preserved across runs
For each date, a database of (true address, substituted address)
pairings.
Algorithms
Substituted addresses are pseudo-random, formed by MD5-hashing a
string (S | D | A | N) and taking the first 32 bits.
S = fixed secret hash seed, long term
D = date of data, in YYYYMMDD format
N = integer, starting at 0 and incremented if resulting address
is an internal one or a collision.
to obfuscate an IP address: {
if it's internal, return it unchanged. otherwise
is a substitute is already assigned? If so, return it. otherwise
for ( done = N = 0; !done; N++ ) {
generate substitute address by hashing as above
if ( !collision ) done = 1
}
save forward & reverse mappings
}
for each netflow record {
i = 0
if ( src is external ) {
obfuscate src; i++
}
if ( dst is external ) {
obfuscate dst; i++
}
if ( i != 1 ) log an unusual condition
write output
}
Scripts:
generator loops over input files, applying obfuscator, writing
temp-named
output file, then renaming completed output file to permanent name.
mover looks for completed output files, copies them to destination,
then
looks for more, sleeping and retrying if there are none.
Other notes:
The obfuscated mappings can be regenerated at will if exactly the
same data
is processed in the same sequence, and the secret hash seed is known.
---------------------------------------------------------------------
The Cryptography Mailing List
Unsubscribe by sending "unsubscribe cryptography" to [EMAIL PROTECTED]