> Using cells to hash wifis could be a real issue but it would be a safe > method. (so you need at least one wifi and one cell)
A hash of cell ID + BSSID is relatively easy to attack by means of brute force. Assuming an attacker has a list of cells and their locations (if not from Mozilla, then from one of the other projects), he can filter by location and thus obtain all cells in his area of interest. An attacker could filter for one single provider and network type to further reduce the number of cells to try. In my neighborhood I hit a new cell every 1-2 km, so the coverage area of a cell is somewhere in the 1-4 km² range there - and is even less dense in less populated areas. As with my earlier suggestion of salting BSSID hashes with an approximate location, the number of cell towers is not big enough to be a barrier to brute forcing. To protect against stalkers, our data would probably need to withstand a brute force attack for somewhere between 1-10 years. > but what about 2 wifi aps but no cell? (too rare or do we need to use > wifis to hash other wifis, too?) That has been proposed before and would effectively give the same behavior as the online APIs: you need to know to nearby BSSIDs to get a location. In terms of security, this is probably sufficient. The issue here is that we would again store not one hash per WiFi, but one for each possible pair of neighboring WiFis. A WiFi which has ten neighbors would then have ten records in the database, each with a different hash. In areas with many WiFis, such as large apartment blocks, the number of neighbors for each WiFi is even larger. Again, this would bloat the database. > Is it possible to compress such a hash database? Compression would probably not help a lot. Each record would consist of a hash, latitude and longitude. A lat/lon pair, accurate to 10^-5 degrees (1 m at the equator), has close to 43 bits of entropy and can be encoded in 6 bytes. The even bigger part is the hash. As with all encrypted data, it has a high amount of entropy and thus does not compress well. The length depends on the hash function used and is in the three-digit range. > how accurate do we want to make the lat/lon coordinates? > (street-/city-/.../countrylevel) or as accurate as possible? If we can solve the privacy issue, there is probably little reason to cap accuracy, but a good reason to keep it as high as possible: some applications may require a certain degree of accuracy. Indoor navigation in a mall will not work with city-level accuracy. In fact, accuracy will be limited by technical characteristics of the system: we use GPS for reference, which under ideal conditions has an error margin of 10 m (i.e. there is a >=95% likelihood that the actual location is within 10 m of the measured location). WiFis typically have a range of 2 km in theory, though physical obstacles (most notably building structures) can greatly diminish this by a factor of 10 and more, and we have to guess the actual position based on a few measurements scattered across the range. How accurately we can determine the location of a single WiFi depends largely on the algorithm used, and the quality of measurements available. If we are very lucky we might get close to a 10 m error range, therefore I would suggest a resolution of 1 m for the data returned. > If we want to hash coordinates (lat/lon) we could use geohash to do so > instead of md5: > http://en.wikipedia.org/wiki/Geohash That would be an efficient way of storing it. However, when I referred to "hashing" in the database I was referring to a cryptographic hash, based on the BSSID and some other information, and designed in such a way that it is impossible to reconstruct the "ingredients" given the hash. In my earlier comment on encrypting lat/lon I was talking about a way to prevent users from filtering the database by geographic region. Which means in this case we'd be talking about encryption (reversible with the matching key) - but before we discuss that in more detail I would first want to hear other people's opinions on whether filtering a downloadable database by location is an issue. _______________________________________________ dev-geolocation mailing list [email protected] https://lists.mozilla.org/listinfo/dev-geolocation
