On Mon, Sep 9, 2013 at 7:15 PM, Hanno Schlichting
<hschlicht...@mozilla.com> wrote:
> On 09.09.2013, at 18:13 , Brian Smith <br...@briansmith.org> wrote:
>> On Mon, Sep 9, 2013 at 2:58 PM, Chris Peterson <cpeter...@mozilla.com> wrote:

> [T]here's one crucial difference between Google and us: We would
> like to make as much of this data public as possible, while Google will always
> just provide a service without access to the underlying data.

> We were looking for two things with using the sha1:
>
> - Make it possible for the end-user to change their unique value (they cannot 
> change the mac address, but they can change the ssid). This allows them to 
> "invalidate" historical records in the database.

There is friction in changing SSIDs as it affects every device that
would connect to that network. There will also probably not be much
awareness among users of when/why/how to do this or what effect it
will have. So, I think this is an aspect that sounds great in theory,
but in practice will nearly never be used.

> - Make it harder for spammers to "guess" actual unique keys and flood our 
> service. Mac addresses have a vendor prefix, which makes it rather easy to 
> generate lots of valid mac addresses. Taking the ssid into account makes it 
> harder to generate valid keys. Unfortunately the ssid itself is considered 
> private data in European countries, so you aren't allowed to store it without 
> the users consent. That's why Google and everyone else has stopped storing 
> them and only use mac addresses now.
>
> The sha1 scheme might be ineffective in doing this.

If x is private data then SHA1(x), SHA2(x), PBKDF2(x), and even
AES256(x, key) with a key known to you are all private data too.

>> Therefore, I'd suggest that you avoid using any protection at all, and
>> just use x instead of H(x) until we are very confident there is no way
>> we can further improve the filtering.
>
> This sounds like good advice and I'm starting to lean into this direction.
>
> But this only helps us on the "we provide a service" side. It's still unclear 
> to me if and how we could share any of this data as database dumps.

If you wanted to publish this data, and the data was stored in its raw
state, then you could always apply whatever mapping (SHA2, PKBKFD2,
AES256 with random and thrown-away key, etc.) right before you share
the data.

Even if you use AES256 with a random, thrown-away key, the data will
be subject to reverse engineering. For example, one could correlate a
subset of the data with a separate database of known
(MAC,SSID,Location) triples, and/or attempt "traffic analysis" to see
relationships in how (MAC,SSID) pairs interact with each other with
respect to location. You have probably heard of the Netflix Prize
privacy issues [1]; this is a very similar problem to the Netflix
prize. Therefore, while it may be important to obscure the data before
giving it to researchers, we should still consider the obscured data
to be highly-sensitive confidential user data.

[1] http://en.wikipedia.org/wiki/Netflix_Prize#Privacy_concerns

Cheers,
Brian
-- 
Mozilla Networking/Crypto/Security (Necko/NSS/PSM)
_______________________________________________
dev-security mailing list
dev-security@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-security

Reply via email to