Oh :) forgot that I need to use the contextmenu to reply to the list in
Thunderbird and only send my last reply to this discussion to Michael's
email address,
This is what he replied to me:
Am 13.07.2014 15:43, schrieb Michael von Glasow:
Hi Felix,
Is it by intention that your reply went to me personally but not to
the list? If not, feel free to forward my reply to the list.
On 11/07/14 19:18, Felix Baumann wrote:
If we use hashes than we could reduce the time needed to check all of
them by filtering out inappropriate hashes beforehand. Only entries
in the same time-zone, country or in range of the devices IP will be
checked then.
Does that mean the downloadable database should include information
such as country or time zone? Also, I don't quite understand what you
mean by "in range of the device's IP". WiFi-based geolocation uses the
BSSID, i.e. the MAC address, of access points, which is one layer
lower than IP. In most cases we never obtain any IP addresses from
these access points, as most of them are secured – we can see and
identify them but not connect to them, but if we know their
coordinates, we can use their identification data for geolocation.
While I would sort out details on optimizing the performance of a
database search later and for now focus on the question of how to
provide a database download without sacrificing privacy, I believe
looking up a hash in a database is a fairly straightforward operation,
and the extra cost of filtering data would largely cancel out the
benefits of a quicker lookup.
The principle of a hash, as previously discussed here, is to prevent
lookups based on the BSSID alone. Records are not identified by the
plain BSSID but a salted hash of the BSSID. The salt would be a piece
of extra information that is easy to obtain when one is in the
vicinity of the BSSID but hard to guess otherwise. Therefore, if I
want to look up my position and have that extra information, I can
easily calculate the hashes of nearby BSSIDs and look up their
position. However, if I want to stalk someone and want to use their
BSSID to determine where they have moved, I would need to guess all
possible hashes for their BSSID. Our design goal is to make that
guesswork impractical.
Now if we had a way to pre-filter data, I would worry that this would
reduce the number of records which a malicious user would have to
search, thus making their life easier, without providing much of an
improvement for legitimate uses.
Taking paranoia one step further, I would even start wondering if any
possibility to filter the database based on coordinates could be
useful for rogue users. A stalker might be able to narrow down the
area in which their would-be victim is likely to be found to a
country, state or even just a metropolitan area, then filter the
database for the relevant records and be able to operate on a much
smaller set of data.
Here Sam's proposal would really come in handy, as it gives out
lat/lon individually but never in pairs, thus making that kind of
filtering less effective. Filtering data can only be done by lat/lon
boundaries, ruling out more sophisticated constructs such as polygons,
and filtered data would still contain a lot of records whose latitude
is inside the target area but the longitude is not, or vice versa.
Another option would be to use not only salted hashes, but also
encrypted coordinates (the salt for the hash and the encryption key
could be derived from the same information).
I'm not sure whether we need to hash cells but if not we could use
the nearby cells to get an even more accurate position.
I don't think we need to hash cells – they contain no private data,
and the general consensus seems to be that cells and their locations
can be given out in plain.
I had considered using nearby cells for the salt – there are
situations in which there is just one WiFi in range, but in most cases
there will be a cellular connection. There are, however, two issues
with cells:
Many devices on the market don't expose all cells in range through
their API, so a geolocation service running on the device might only
be able to obtain the currently serving cell. That means we need to
keep multiple hashes for each WiFi – one for each cell whose range it
touches.
Most locations, especially in urban areas, are in the range of
multiple cells. At my home, I frequently get handed over between three
different cells of my carrier – and these are just the 3G cells. There
are also 2G cells and 4G cells in range. And, finally, the area in
which I live is served by four different carriers (a huge share of
countries have somewhere between 2 and 4 carriers). That means my home
WiFi would need 36 different hashes (3 cells × 3 standards × 4
carriers), which would bloat the database. Even a more conservative
estimate (2 cells, 2 standards, 1 carrier) would still require four
different hashes for one BSSID.
Some new questions of mine:
Using cells to hash wifis could be a real issue but it would be a safe
method. (so you need at least one wifi and one cell)
but what about 2 wifi aps but no cell? (too rare or do we need to use
wifis to hash other wifis, too?)
Is it possible to compress such a hash database?
How often would such a database download be updated? (once per week?)
how accurate do we want to make the lat/lon coordinates?
(street-/city-/.../countrylevel) or as accurate as possible?
Regards,
Felix
_______________________________________________
dev-geolocation mailing list
[email protected]
https://lists.mozilla.org/listinfo/dev-geolocation