On 9/9/13 6:13 PM, Brian Smith wrote:
I assume by "prevents people from tracking individual access points"
means the following: Some people have a personal access point on them
(e.g. in their phone). If somebody knows the SSID and MAC of this
personal access point, then they could track this person's location by
polling the database for that (SSID, MAC) pair.

Tracking a person's movements by polling the database would not be useful because we would probably update the database infrequently (days or weeks). The location database would be generated offline from analysis of many raw measurements submitted by the stumbler app.

The tracking scenario that might be viable is a tracker who knows someones MAC address and current SSID and that person moves to a different city or state. The database delay wouldn't matter as much. The hash of hashes scheme tries to protect against that by requiring two neighboring APs.


MAC addresses are 48 bits. SSIDs are often guessable or predictable.
Therefore, using the H(MAC+SSID) instead of just the plain MAC+SSID is
not buying you much in terms of privacy, IMO. Basically, if you are
really trying to use this as a privacy mechanism then you should store
the MAC+SSID according to best practices for storing passwords. For
example, use PBKDF2 with a large number of iterations. Regardless of
whether you use SHA1, SHA2, PBKDF2, or something else, I will still
call whatever function you use H(x). But, I am not sure that switching
to PBKDF2 even buys you much improved privacy protection.

The primary motivation for hashing the MAC+SSID was to avoid uploading the SSID (which is considered private data in some European countries) while still using the SSID as sort of weak protection against "database pollution" from malicious stumblers reporting spoofed MAC addresses. Even if our database will filled with junk MAC address, real clients would probably not see the same combination of MAC and SSID in the real world when they sent a geolocation request to the server.


Other layers of privacy protection include filtering out ad-hoc Wi-Fi
networks; MAC addresses with vendor prefixes from mobile device manufacters
(e.g. Apple and HTC); SSIDs commonly associated with mobile devices (e.g.
"XXX's iPhone" and Google's "_nomap" opt-out); and APs reported in multiple
locations.

I think that these things are much more important than the protection
offered by H(x). My concern is that if you store the data on the
server as H(x) then you will not be able to do the above filtering on
the server unless H(x) is ineffective. That seems bad, because the
server will be much easier to update to improve the filtering than the
clients will be, AFAICT. Also, you will not be able to measure the
effectiveness of the privacy protections on the server, which is also
very bad.

Very good points. We are currently filtering on the stumbler client side. Today, the server just receives mystery hashes with latitude and longitude.

Given just MAC addresess, the server could still filter out ad-hoc networks; vendor prefixes for known mobile device manufacturers; and unrecognized vendor prefixes (because some mobile devices supposedly generate a completely random MAC addresses).

We would still need to rely on the stumbler to filter SSIDs. We can't upload SSIDs to the server because they are considered private data in some European countries (though MAC addresses, which are more unique, are apparently not considered private data, in a legal sense).

We've compiled a list of about 70 SSID prefixes and suffixes we've seen from mobile devices (e.g. "Android*", "Verizon *", or "*'s iPhone"). Not all of these mobile devices use ad-hoc MAC addresses.

Trivia: over a couple years of my own Wi-Fi stumbling/wardriving in three countries and six US states, I have recorded over 100K unique APs and only eight used Google's "_nomap" SSID opt-out suffix!


chris
_______________________________________________
dev-security mailing list
dev-security@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-security

Reply via email to