On Jun 8, 2005, at 15:19, [EMAIL PROTECTED] wrote:
I'd like to come up to speed on the state of the
art in de-identification (~=anonymization) of data
especially monitoring data (firewall/hids logs, say).

I don't know the state of the art, but I can tell you the state of the artless. I had a request to share ourr border router traffic logs (Cisco netflow) with a university, so they could try out some anomaly detection schemes they were working on.

(Bkgnd: We don't consider our network topology sensitive. Our traffic logs are subject to a general respect for privacy.)

Since they could send us packets of their choosing, I deemed it useless to obfuscate our own IP addresses. I chose to anonymize all the external addresses. My design note is below.

But then, as fate would have it, the university said they needed the true external addresses. That left me a bit stumped. Perhaps a less chaotic mapping, like one that is bijective between classful network numbers, would do.
============================

obfuscation filter program

  Parameters
    Blocks of IP addresses deemed internal.  Internal includes multicast
    addresses and RFC 1918 "private use" address.

  Working data preserved across runs
For each date, a database of (true address, substituted address) pairings.

  Algorithms
    Substituted addresses are pseudo-random, formed by MD5-hashing a
    string (S | D | A | N) and taking the first 32 bits.
      S = fixed secret hash seed, long term
      D = date of data, in YYYYMMDD format
      N = integer, starting at 0 and incremented if resulting address
          is an internal one or a collision.

    to obfuscate an IP address: {
      if it's internal, return it unchanged.  otherwise
       is a substitute is already assigned?  If so, return it. otherwise
        for ( done = N = 0; !done; N++ ) {
          generate substitute address by hashing as above
          if ( !collision ) done = 1
        }
        save forward & reverse mappings
    }

    for each netflow record {
      i = 0
      if ( src is external ) {
        obfuscate src; i++
      }
      if ( dst is external ) {
        obfuscate dst; i++
      }
      if ( i != 1 ) log an unusual condition
      write output
    }

Scripts:

generator loops over input files, applying obfuscator, writing temp-named
  output file, then renaming completed output file to permanent name.

mover looks for completed output files, copies them to destination, then
  looks for more, sleeping and retrying if there are none.

Other notes:

The obfuscated mappings can be regenerated at will if exactly the same data
  is processed in the same sequence, and the secret hash seed is known.


---------------------------------------------------------------------
The Cryptography Mailing List
Unsubscribe by sending "unsubscribe cryptography" to [EMAIL PROTECTED]

Reply via email to