On 9/9/13 6:13 PM, Brian Smith wrote:
I assume by "prevents people from tracking individual access points"
means the following: Some people have a personal access point on them
(e.g. in their phone). If somebody knows the SSID and MAC of this
personal access point, then they could track this person's location by
polling the database for that (SSID, MAC) pair.
Tracking a person's movements by polling the database would not be
useful because we would probably update the database infrequently (days
or weeks). The location database would be generated offline from
analysis of many raw measurements submitted by the stumbler app.
The tracking scenario that might be viable is a tracker who knows
someones MAC address and current SSID and that person moves to a
different city or state. The database delay wouldn't matter as much. The
hash of hashes scheme tries to protect against that by requiring two
neighboring APs.
MAC addresses are 48 bits. SSIDs are often guessable or predictable.
Therefore, using the H(MAC+SSID) instead of just the plain MAC+SSID is
not buying you much in terms of privacy, IMO. Basically, if you are
really trying to use this as a privacy mechanism then you should store
the MAC+SSID according to best practices for storing passwords. For
example, use PBKDF2 with a large number of iterations. Regardless of
whether you use SHA1, SHA2, PBKDF2, or something else, I will still
call whatever function you use H(x). But, I am not sure that switching
to PBKDF2 even buys you much improved privacy protection.
The primary motivation for hashing the MAC+SSID was to avoid uploading
the SSID (which is considered private data in some European countries)
while still using the SSID as sort of weak protection against "database
pollution" from malicious stumblers reporting spoofed MAC addresses.
Even if our database will filled with junk MAC address, real clients
would probably not see the same combination of MAC and SSID in the real
world when they sent a geolocation request to the server.
Other layers of privacy protection include filtering out ad-hoc Wi-Fi
networks; MAC addresses with vendor prefixes from mobile device manufacters
(e.g. Apple and HTC); SSIDs commonly associated with mobile devices (e.g.
"XXX's iPhone" and Google's "_nomap" opt-out); and APs reported in multiple
locations.
I think that these things are much more important than the protection
offered by H(x). My concern is that if you store the data on the
server as H(x) then you will not be able to do the above filtering on
the server unless H(x) is ineffective. That seems bad, because the
server will be much easier to update to improve the filtering than the
clients will be, AFAICT. Also, you will not be able to measure the
effectiveness of the privacy protections on the server, which is also
very bad.
Very good points. We are currently filtering on the stumbler client
side. Today, the server just receives mystery hashes with latitude and
longitude.
Given just MAC addresess, the server could still filter out ad-hoc
networks; vendor prefixes for known mobile device manufacturers; and
unrecognized vendor prefixes (because some mobile devices supposedly
generate a completely random MAC addresses).
We would still need to rely on the stumbler to filter SSIDs. We can't
upload SSIDs to the server because they are considered private data in
some European countries (though MAC addresses, which are more unique,
are apparently not considered private data, in a legal sense).
We've compiled a list of about 70 SSID prefixes and suffixes we've seen
from mobile devices (e.g. "Android*", "Verizon *", or "*'s iPhone"). Not
all of these mobile devices use ad-hoc MAC addresses.
Trivia: over a couple years of my own Wi-Fi stumbling/wardriving in
three countries and six US states, I have recorded over 100K unique APs
and only eight used Google's "_nomap" SSID opt-out suffix!
chris
_______________________________________________
dev-security mailing list
dev-security@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-security