Re: sample ledgers and anonymising

Anthony Chivetta Fri, 03 Jul 2009 06:31:15 -0700

* Peter Keen <[email protected]> [2009-06-26 17:49:25 -0700]:

> On Fri, Jun 26, 2009 at 5:40 PM, Simon Michael<[email protected]> wrote:
> > Is that kind of hash hard to reverse-engineer ? If I published the --anon
> > version of my company's ledger, how hard would it be for a motivated person
> > to decode the names ?
> 
> SHA-1 is a one-way hash, meaning that it's statistically highly
> improbable that someone could recreate the original text from just the
> hash. In controlled circumstances it's possible to create an
> equivalent plain-text that generates the same SHA-1, but this is
> pretty limited and still won't reveal your original account names.


Please keep in mind that the dictionary one pulls account names from is
MUCH smaller than the dictionary of valid ASCII strings.  Given the
SHA-1 hash of a string $foo, it is easy to tell if this is the hash of
"Bar, Inc." or "Baz, Inc.".  So, while in theory it is hard to reverse,
it becomes very easy in practice if I know anything about your company
or even the language that the accounts will be in.  Finally, because of
rainbow tables, these might even be easy to reverse without knowing
anything about the target company.

If you insist of using a straight hash, you should pick a random seed to
use when calculating the hash and not include that seed in the output
file.

Also, take AOL's search data incident as an example of data
anonymization gone wrong.

Anthony Chivetta

Re: sample ledgers and anonymising

Reply via email to