It's been a while since I dug into it, but something like an 8-bit CRC<http://en.wikipedia.org/wiki/Cyclic_redundancy_check>would probably provide enough disambiguation but would collide often enough to not be much of a concern for reversing - 256 different values.
On Thu, Feb 6, 2014 at 4:10 PM, Chris Dary <umb...@gmail.com> wrote: > Just one thought to throw out: Something that sprang to mind is the idea > of a check digit or simplified hash that would be redundant enough to > collide very often if you were trying to reverse, but would still provide > enough disambiguation that you'd be able to appropriately determine who > you're dealing with. > > You could probably use something similar to the Luhn algorithm for that, > although I'm not sure how uniform that is: > http://en.wikipedia.org/wiki/Luhn_algorithm - also, that only ends up > with a single check digit, which is probably too small for good > disambiguation. The approach in general might still be helpful though. > > -Chris > > > On Thu, Feb 6, 2014 at 3:49 PM, Tom Lee <t...@sunlightfoundation.com>wrote: > >> We've been kicking around an idea at Sunlight that aims to use >> cryptographic ideas to resolve some of the concerns around the publication >> of publicly identifiable information in government disclosures. I could use >> some smart people to tell me what's dumb about it. >> >> We often face challenges related to disambiguating entities: is the John >> Smith who gave political donation A the same John Smith that gave political >> donation B? One obvious solution to this problem is to push to expand the >> information that's collected and disclosed -- if we had John's driver's >> license number (DLN), for instance, it'd be easy to disambiguate these >> records. But that could introduce privacy concerns for John. One approach >> to this problem (which I don't think government has tried) is employing a >> one-way hash. >> >> Obviously the input key space for DLNs and most other personal ID numbers >> is so small that reversing this with a dictionary attack would be trivial. >> You can add a salt, but only on a per-entity basis (not a per-record basis) >> if you want to preserve the capacity to disambiguate. That in turns calls >> for a lookup table in which the input keys are stored, which kind of >> defeats the point of using a hash (you might as well just assign random >> output IDs for each input ID). I would worry about government's ability to >> keep this lookup table secure, and I worry about the brittleness of such a >> system. >> >> Alternately, you can use a single system-wide secret (or set of secrets) >> to transform inputs into reliable outputs. I think this is less brittle and >> maybe easier to preserve as a secret, but this system might be too easily >> reversible given the ability to observe its outputs and know the universe >> of possible inputs. I'm unsure of the cryptographic options that might be >> appropriate here. >> >> For all I know, the lack of implementations using this kind of one-way >> transformation isn't about government sluggishness but rather about its >> feasibility. I'd be very curious to hear folks ideas on this score, though. >> My general hunch is that something must be possible -- even a few bits' >> worth of disambiguating information would be hugely useful to us, and >> presumably you're not leaking important amounts of information by, say, >> sharing the last digit of a DLN. So there must be a spectrum of options. >> But as is probably apparent, I don't think I've got a handle on how to >> think about this problem rigorously. >> >> Tom >> >> -- >> You received this message because you are subscribed to the Google Groups >> "sunlightlabs" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to sunlightlabs+unsubscr...@googlegroups.com. >> To post to this group, send email to sunlightl...@googlegroups.com. >> Visit this group at http://groups.google.com/group/sunlightlabs. >> For more options, visit https://groups.google.com/groups/opt_out. >> > >
-- Liberationtech is public & archives are searchable on Google. Violations of list guidelines will get you moderated: https://mailman.stanford.edu/mailman/listinfo/liberationtech. Unsubscribe, change to digest, or change password by emailing moderator at compa...@stanford.edu.