On Mon, Sep 9, 2013 at 2:58 PM, Chris Peterson <cpeter...@mozilla.com> wrote:
> Google's Location Service prevents people from tracking individual access
> points by requiring requests to include at least 2-3 access points that
> Google knows are near each other. This "proves" the requester is near the
> access points.

I assume by "prevents people from tracking individual access points"
means the following: Some people have a personal access point on them
(e.g. in their phone). If somebody knows the SSID and MAC of this
personal access point, then they could track this person's location by
polling the database for that (SSID, MAC) pair. Google tries to limit
this type of abuse as much as practical while providing still
providing a location service based on such crowdsourced data.

> Unlike Google's Location Service, our server does not store MAC addresses or
> SSIDs. We identify access points by hash IDs, specifically SHA1(MAC+SSID).
> To query the location of an access point in the database, you must know both
> its MAC address and current SSID.

MAC addresses are 48 bits. SSIDs are often guessable or predictable.
Therefore, using the H(MAC+SSID) instead of just the plain MAC+SSID is
not buying you much in terms of privacy, IMO. Basically, if you are
really trying to use this as a privacy mechanism then you should store
the MAC+SSID according to best practices for storing passwords. For
example, use PBKDF2 with a large number of iterations. Regardless of
whether you use SHA1, SHA2, PBKDF2, or something else, I will still
call whatever function you use H(x). But, I am not sure that switching
to PBKDF2 even buys you much improved privacy protection.

>     H1 = Hash(AP1.MAC + AP1.SSID)
>     H2 = Hash(AP2.MAC + AP2.SSID)
>
> Our private database's schema looks something like:
>
>     Hash(AP1.MAC + AP1.SSID) ==> AP1.latitude, AP1.longitude, ...
>     Hash(AP2.MAC + AP2.SSID) ==> AP2.latitude, AP2.longitude, ...
>
> Our published database would include two tables. The first table would map a
> random row id to metadata about an anonymous access point:
>
>     Random1 ==> AP1.latitude, AP1.longitude, ...
>     Random2 ==> AP2.latitude, AP2.longitude, ...
>
> The second table's primary key would be a hash of hashes. It would map a
> hash of two neighboring access points' hash IDs to a row id of the first
> table. Something like:
>
>     Hash(H1 + H2) ==> Random1
>     Hash(H2 + H1) ==> Random2
>
> Someone querying the published database would need to know the MAC addresses
> and current SSIDs of two neighboring access points to look up either's
> location.

If  you know the MAC+SSID of person X's personal access point and the
MAC+SSID of person Y's personal access point, then you can use this
database to ask the question "are person X and person Y in the same
location?" This seems bad. I see that you attempt to address this
below.

> btw, should we use SHA-2 instead of SHA-1?

There is no reason to use SHA-1 when you have SHA-2 available.
However, as I indicated above, it isn't clear it is a good idea to be
using any plain hash function as H(x).

> Other layers of privacy protection include filtering out ad-hoc Wi-Fi
> networks; MAC addresses with vendor prefixes from mobile device manufacters
> (e.g. Apple and HTC); SSIDs commonly associated with mobile devices (e.g.
> "XXX's iPhone" and Google's "_nomap" opt-out); and APs reported in multiple
> locations.

I think that these things are much more important than the protection
offered by H(x). My concern is that if you store the data on the
server as H(x) then you will not be able to do the above filtering on
the server unless H(x) is ineffective. That seems bad, because the
server will be much easier to update to improve the filtering than the
clients will be, AFAICT. Also, you will not be able to measure the
effectiveness of the privacy protections on the server, which is also
very bad.

Therefore, I'd suggest that you avoid using any protection at all, and
just use x instead of H(x) until we are very confident there is no way
we can further improve the filtering.

Cheers,
Brian Smith
-- 
Mozilla Networking/Crypto/Security (Necko/NSS/PSM), NSA plant
_______________________________________________
dev-security mailing list
dev-security@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-security

Reply via email to