Re: Idea for database download

Felix Baumann Sun, 13 Jul 2014 07:55:31 -0700

Oh :) forgot that I need to use the contextmenu to reply to the list inThunderbird and only send my last reply to this discussion to Michael'semail address,

This is what he replied to me:

Am 13.07.2014 15:43, schrieb Michael von Glasow:

Hi Felix,
Is it by intention that your reply went to me personally but not tothe list? If not, feel free to forward my reply to the list.
On 11/07/14 19:18, Felix Baumann wrote:
If we use hashes than we could reduce the time needed to check all ofthem by filtering out inappropriate hashes beforehand. Only entriesin the same time-zone, country or in range of the devices IP will bechecked then.
Does that mean the downloadable database should include informationsuch as country or time zone? Also, I don't quite understand what youmean by "in range of the device's IP". WiFi-based geolocation uses theBSSID, i.e. the MAC address, of access points, which is one layerlower than IP. In most cases we never obtain any IP addresses fromthese access points, as most of them are secured – we can see andidentify them but not connect to them, but if we know theircoordinates, we can use their identification data for geolocation.
While I would sort out details on optimizing the performance of adatabase search later and for now focus on the question of how toprovide a database download without sacrificing privacy, I believelooking up a hash in a database is a fairly straightforward operation,and the extra cost of filtering data would largely cancel out thebenefits of a quicker lookup.
The principle of a hash, as previously discussed here, is to preventlookups based on the BSSID alone. Records are not identified by theplain BSSID but a salted hash of the BSSID. The salt would be a pieceof extra information that is easy to obtain when one is in thevicinity of the BSSID but hard to guess otherwise. Therefore, if Iwant to look up my position and have that extra information, I caneasily calculate the hashes of nearby BSSIDs and look up theirposition. However, if I want to stalk someone and want to use theirBSSID to determine where they have moved, I would need to guess allpossible hashes for their BSSID. Our design goal is to make thatguesswork impractical.
Now if we had a way to pre-filter data, I would worry that this wouldreduce the number of records which a malicious user would have tosearch, thus making their life easier, without providing much of animprovement for legitimate uses.
Taking paranoia one step further, I would even start wondering if anypossibility to filter the database based on coordinates could beuseful for rogue users. A stalker might be able to narrow down thearea in which their would-be victim is likely to be found to acountry, state or even just a metropolitan area, then filter thedatabase for the relevant records and be able to operate on a muchsmaller set of data.
Here Sam's proposal would really come in handy, as it gives outlat/lon individually but never in pairs, thus making that kind offiltering less effective. Filtering data can only be done by lat/lonboundaries, ruling out more sophisticated constructs such as polygons,and filtered data would still contain a lot of records whose latitudeis inside the target area but the longitude is not, or vice versa.Another option would be to use not only salted hashes, but alsoencrypted coordinates (the salt for the hash and the encryption keycould be derived from the same information).
I'm not sure whether we need to hash cells but if not we could usethe nearby cells to get an even more accurate position.
I don't think we need to hash cells – they contain no private data,and the general consensus seems to be that cells and their locationscan be given out in plain.
I had considered using nearby cells for the salt – there aresituations in which there is just one WiFi in range, but in most casesthere will be a cellular connection. There are, however, two issueswith cells:
Many devices on the market don't expose all cells in range throughtheir API, so a geolocation service running on the device might onlybe able to obtain the currently serving cell. That means we need tokeep multiple hashes for each WiFi – one for each cell whose range ittouches.
Most locations, especially in urban areas, are in the range ofmultiple cells. At my home, I frequently get handed over between threedifferent cells of my carrier – and these are just the 3G cells. Thereare also 2G cells and 4G cells in range. And, finally, the area inwhich I live is served by four different carriers (a huge share ofcountries have somewhere between 2 and 4 carriers). That means my homeWiFi would need 36 different hashes (3 cells × 3 standards × 4carriers), which would bloat the database. Even a more conservativeestimate (2 cells, 2 standards, 1 carrier) would still require fourdifferent hashes for one BSSID.


Some new questions of mine:

Using cells to hash wifis could be a real issue but it would be a safemethod. (so you need at least one wifi and one cell)but what about 2 wifi aps but no cell? (too rare or do we need to usewifis to hash other wifis, too?)


Is it possible to compress such a hash database?

How often would such a database download be updated? (once per week?)

how accurate do we want to make the lat/lon coordinates?(street-/city-/.../countrylevel) or as accurate as possible?


Regards,
Felix
_______________________________________________
dev-geolocation mailing list
[email protected]
https://lists.mozilla.org/listinfo/dev-geolocation

Re: Idea for database download

Reply via email to