On 8/26/14 11:14 AM, Hanno Schlichting wrote:
Hi.
It’s been a long time in the planning, but we are finally getting
closer to actually making the aggregated cell network data available
for download.
We have worked with the OpenCellID project to agree on a new shared
export format, to make it easier for anyone using either of our two
data sources. The details of the new data format are documented at
http://mozilla-ichnaea.readthedocs.org/en/latest/import_export.html.
As a concrete example, I’ve exported some of the most recent cell
networks from our live database. You can get the sample at:
https://www.dropbox.com/s/vcmjuozhv0fjpmm/MLS-diff-cell-export-2014-08-26T130000.csv.gz?dl=0
The data is licensed under CC-0 terms, so has neither copyright nor
database right restrictions
(https://creativecommons.org/publicdomain/zero/1.0/).
If you haven’t followed the github issues about this topic, now’s
your time to share feedback and concerns.
If all goes well, we are looking at adding a public downloads section
to the website next Tuesday and making all of our cell network data
available.
Best, Hanno _______________________________________________
dev-geolocation mailing list [email protected]
https://lists.mozilla.org/listinfo/dev-geolocation
Hello all,
Hanno's email was timely since I was about to send a mail asking about
this API effort. I do have "Feedback and Concerns"; here are some.
At the process level, a week before launch is *very* late to be asking
for feedback on a public API! The GeoLocation Web API and the Mozilla
Location Service upload API both would have benefited from some good,
structured logic to avoid their bad structure and naming. I guess the
discussion on this API was all happening on GitHub issues and not on
this list. For something 'a long time in the planning' though, this
public notice is sadly late.
The Internet end of line separator is [carriage-return, line feed] as
per all IETF standards.
The Text/Plain media type is the lowest common denominator of
Internet email, with lines of no more than 998 characters (by
convention usually no more than 78), and where the carriage-return
and line-feed (CRLF) sequence represents a line break (see [MIME-IMT]
and [MSG-FMT]).
http://tools.ietf.org/html/rfc3676
This started with email, was kept for HTTP, and, absent strong reasons
to change it, sticking with the standard is best policy.
The proposed API needs work: the semantics are a mishmash and the naming
is terrible. The page name 'import-export' is in direct conflict with
the API structure which appears to only have been thought out for
export. Here I only discuss export because developing an API that can
serve both will take way more time and we might as well start somewhere.
Nonetheless, the clearer the name and documentation, the more reusable
the element.
Semantically, the API is offering a set of individual data records each
of which consists of:
*a set of labels which jointly identify an individual antennae
*known properties of the antennae
*measurements of that antennae
*estimated properties of that antennae
*record metadata
Unfortunately, neither the names nor the documentations properly
separate out these roles.
Let's walk the proposal:
'mcc' okay but -> 'mobCountryId' to match others
'net' bad name -> 'mobProviderId'
'area' bad name -> 'mobAreaId'
'cell' bad name -> 'mobCellId'
'unit' bad name -? 'mobSubUnitId'
Come on! 'net', 'area', 'cell', 'unit' have generic meaning in the world
that has nothing to do with your API. Put in a little more effort to
your naming, please! Save your users some headaches.
All of these seem, from what I can tell, to be code identifiers which,
JOINTLY, label the specific radio antennae which is the subject of the
data record. Semantically we really have
'antenneId' : 'mcc'&'net'&'area'&'cell'&'unit'
but here we use five fields instead of one. Fine, but this needs clear
documentation. In a JSON API these would properly be jointly in a
sub-structure but since this is a flat API we just need clarity in the
documentation stating that jointly these identifiers will provide a
unique label for each record.
Ideally, these names would all have form 'id...' but English places its
adjectives first 'red car' (versus 'voiture rouge' in French or 'coche
rojo' in Spanish) so we end up with a structure '...Id'. My proposed
shared prefix 'mob' helps clarify these fields are all similar and work
jointly.
'radio' bad name -> 'radioClass' or some such
The current MozLocService item upload has a similar crappy naming
approach where each item has a 'radio' element but then each observed
cell in the item also has its own 'radio' element, of course with
different data. So I have to have this ridiculous lookup object:
var CELL_TYPE_LOOKUP = {
'type': ['cell.radio', 'item.radio'],//Header field
'gsm': ['gsm', 'gsm'], //1G GSM
'edge': ['gsm', 'gsm'], //2G EDGE
'gprs': ['gsm', 'gsm'], //2G GPRS
'umts': ['umts', 'gsm'], //3G UMTS
'hspa': ['umts', 'gsm'], //3.5G HSDPA
'hsdpa': ['umts', 'gsm'], //3.5G HSDPA
'hspa+': ['umts', 'gsm'], //3.5G HSDP+
'hsupa': ['umts', 'gsm'], //3.5G HSDPA
'cdma': ['cdma', 'cdma'], //1G CDMA
'is95a': ['cdma', 'cdma'], //2G CDMA
'is95b': ['cdma', 'cdma'], //2G CDMA
'1xrtt': ['cdma', 'cdma'], //2G CDMA
'evdo0': ['cdma', 'cdma'], //3G CDMA
'evdoa': ['cdma', 'cdma'], //3G CDMA
'evdob': ['cdma', 'cdma'], //3G CDMA
'ehrpd': ['cdma', 'cdma'], //4G CDMA
'lte': ['lte', 'gsm'] //4G LTE
}
to generate what is required. I take it this proposed API element is the
middle column. I have taken to naming the first column 'radioType,' the
second 'radioClass', and the third 'radioFamily' but these names are
arbitrary. First you need to decide on your name and then you need a
bunch more documentation providing essentially this lookup table to
explain this to users.
'lon'
'lat'
The documentation should mention the Coordinate Reference System for
these as being the CRS used by the GPS system, i.e. WGS84. "The prime
meridian is 0 degrees" is a tautology---that's what 'prime meridian'
means. More properly, this could be "The Prime Meridian (with value 0
degrees) is the IERS Reference Meridian, close to, but not the same as,
the Greenwich Airy Meridian."
https://en.wikipedia.org/wiki/World_Geodetic_System
http://spatialreference.org/ref/epsg/4326/
'changeable' terrible name
As far as I can tell, this only applies to the location of the antennae
so the name needs to be linked to the position. From the consumer stand
point, the only thing interesting is how the position has been
'determined': either defined or estimated, and if the latter probably
the user wants some notion of how it was estimated. This could be done
in a single field or in two, depending on what you want
-> 'posEstimationMethod' DEFINED || CENTROID || ALGO_6
or
-> 'posDetermination' DEFINED || ESTIMATED
-> 'posEstimationAlgo' MEASURED|| CENTRIOID || ...
The best way, given the variety of algorithms possible, would be to
define a few and then use an HTTP URI (i.e. an URL) for the rest where
the link is to a web page with the description of the estimation
algorithm or process. Otherwise the documentation needs some indication
of how the position estimation were derived.
Are you punting completely on giving any estimate of the accuracy of the
position? I would expect a
'posAccuracy'
giving a 95% CI radius around the observation since that is the crucial
factor which makes the position usable or not. (The only other element
of your API that would let me guess as to the quality of the data would
be the number of observations but this does not let me know if they were
all in a line or were well distributed spatially.) Since the service,
which has all the data, is the only one who can properly make this
estimate, it seems this should be generated for each record.
'range' bad name -? 'rangeEstimate'
Conceptually, this is an estimate of the distance at which the signal
level drops below some particular strength, perhaps usable strength. So
the documentation should explain that. Of course, for different radio
technologies the threshold strength is probably different, so what is
this really? Is this a property of the radioClass or is this an estimate
based on the observations?
'samples' bad name -> 'obsNumber' or 'numObs' or 'numSamples'
The name 'samples' suggests it is the samples themselves but it is
actually a number. The text says it is the number of observations used
to determine the position but we have already seen the position might
have been defined. So the documentation needs to be clear what other
entries are based on these observations: i.e. the 'range' or
'averageSignal'.
'averageSignal' -> !?
Ouch. Hmm. What is this telling us about? Is this to help us estimate
the quality of the observations or to help us estimate the quality of
the position estimate? 'Max', 'Median', and 'Min' might help with the
former; some kind of referent of 'MaxEverForRadioClass' and
'MinEverForRadioClass' in the documentation would be needed for the
latter. A straight mathematical average for a 2D spatial estimate is
crazy problematic to interpret a posteriori so I am really not sure what
this is supposed to provide users. Some clarity of the usage of this
number and its behviour in the field is needed in the documentation.
'created'
'updated'
Are these purely database modification times or are these related to the
observations? If the latter, 'firstObserved' and 'lastObserved' would be
better names.
Why make it an ambiguous timestamp, when you can make it an unambiguous
ISO 8601 Date (e.g. 2014-07-24T12:16:36Z)?
This is not the API I would have expected.
Without one or a few ways to estimate the accuracy of the position,
these records are of little use for positioning. Without a richer
description of the spatial structure of the observations, like bounding
boxes or partial bounding boxes, these records are of little use in
defining the quality of the overall database. So we are left with being
able to get summary records which neither provide a well defined
estimate of position and other values nor provide a rich summary of the
data. As it stands, this API encourages direct, uncritical use of the
positions; since OpenCellId estimates several antennae as being in the
middle of the ocean, this is not great.
Have you developed a set of usage examples for this API? Are those
written up some where? What is the goal of such usages? I have a
difficult time guessing as to the motivations which led to such an API.
~adrian
_______________________________________________
dev-geolocation mailing list
[email protected]
https://lists.mozilla.org/listinfo/dev-geolocation