I'm looking for some feedback on crypto privacy protections for a geolocation research project I'm working on with the Mozilla Services team. If you have general questions or suggestions about the project, I'm happy to answer them, but I'd like to focus this thread on crypto.

Our team is prototyping a crowd-sourced version of Google's Street View cars to correlate Wi-Fi access points and cell towers to GPS positions. Our primary motivation is to provide non-proprietary location services for Firefox OS devices. We would also like to publish this location data for researchers or other projects that might have novel uses for it.

Google's Location Service prevents people from tracking individual access points by requiring requests to include at least 2-3 access points that Google knows are near each other. This "proves" the requester is near the access points.

Below is a sketch of a scheme that I think will allow us to publish a database of access point locations while still requiring knowledge of two neighboring access points.

Unlike Google's Location Service, our server does not store MAC addresses or SSIDs. We identify access points by hash IDs, specifically SHA1(MAC+SSID). To query the location of an access point in the database, you must know both its MAC address and current SSID.

Our private database maps access point hash IDs to locations (and other metadata). Assuming:

    H1 = Hash(AP1.MAC + AP1.SSID)
    H2 = Hash(AP2.MAC + AP2.SSID)

Our private database's schema looks something like:

    Hash(AP1.MAC + AP1.SSID) ==> AP1.latitude, AP1.longitude, ...
    Hash(AP2.MAC + AP2.SSID) ==> AP2.latitude, AP2.longitude, ...

Our published database would include two tables. The first table would map a random row id to metadata about an anonymous access point:

    Random1 ==> AP1.latitude, AP1.longitude, ...
    Random2 ==> AP2.latitude, AP2.longitude, ...

The second table's primary key would be a hash of hashes. It would map a hash of two neighboring access points' hash IDs to a row id of the first table. Something like:

    Hash(H1 + H2) ==> Random1
    Hash(H2 + H1) ==> Random2

Someone querying the published database would need to know the MAC addresses and current SSIDs of two neighboring access points to look up either's location.

btw, should we use SHA-2 instead of SHA-1? In 2009, NIST recommended that "Federal agencies should stop using SHA-1 for applications that require collision resistance as soon as practical, and must use the SHA-2 family of hash functions for these applications after 2010."

Other layers of privacy protection include filtering out ad-hoc Wi-Fi networks; MAC addresses with vendor prefixes from mobile device manufacters (e.g. Apple and HTC); SSIDs commonly associated with mobile devices (e.g. "XXX's iPhone" and Google's "_nomap" opt-out); and APs reported in multiple locations.


thanks,
chris
_______________________________________________
dev-security mailing list
dev-security@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-security

Reply via email to