On 8/26/14 11:14 AM, Hanno Schlichting wrote:
Hi.

It’s been a long time in the planning, but we are finally getting
closer to actually making the aggregated cell network data available
for download.

We have worked with the OpenCellID project to agree on a new shared
export format, to make it easier for anyone using either of our two
data sources. The details of the new data format are documented at
http://mozilla-ichnaea.readthedocs.org/en/latest/import_export.html.

As a concrete example, I’ve exported some of the most recent cell
networks from our live database. You can get the sample at:
https://www.dropbox.com/s/vcmjuozhv0fjpmm/MLS-diff-cell-export-2014-08-26T130000.csv.gz?dl=0

 The data is licensed under CC-0 terms, so has neither copyright nor
database right restrictions
(https://creativecommons.org/publicdomain/zero/1.0/).

If you haven’t followed the github issues about this topic, now’s
your time to share feedback and concerns.

If all goes well, we are looking at adding a public downloads section
to the website next Tuesday and making all of our cell network data
available.

Best, Hanno _______________________________________________
dev-geolocation mailing list [email protected]
https://lists.mozilla.org/listinfo/dev-geolocation


Hello all,

Hanno's email was timely since I was about to send a mail asking about this API effort. I do have "Feedback and Concerns"; here are some.



At the process level, a week before launch is *very* late to be asking for feedback on a public API! The GeoLocation Web API and the Mozilla Location Service upload API both would have benefited from some good, structured logic to avoid their bad structure and naming. I guess the discussion on this API was all happening on GitHub issues and not on this list. For something 'a long time in the planning' though, this public notice is sadly late.



The Internet end of line separator is [carriage-return, line feed] as per all IETF standards.
   The Text/Plain media type is the lowest common denominator of
   Internet email, with lines of no more than 998 characters (by
   convention usually no more than 78), and where the carriage-return
   and line-feed (CRLF) sequence represents a line break (see [MIME-IMT]
   and [MSG-FMT]).
                           http://tools.ietf.org/html/rfc3676
This started with email, was kept for HTTP, and, absent strong reasons to change it, sticking with the standard is best policy.



The proposed API needs work: the semantics are a mishmash and the naming is terrible. The page name 'import-export' is in direct conflict with the API structure which appears to only have been thought out for export. Here I only discuss export because developing an API that can serve both will take way more time and we might as well start somewhere. Nonetheless, the clearer the name and documentation, the more reusable the element.


Semantically, the API is offering a set of individual data records each of which consists of:
  *a set of labels which jointly identify an individual antennae
  *known properties of the antennae
  *measurements of that antennae
  *estimated properties of that antennae
  *record metadata
Unfortunately, neither the names nor the documentations properly separate out these roles.


Let's walk the proposal:



  'mcc'    okay but -> 'mobCountryId' to match others
  'net'    bad name -> 'mobProviderId'
  'area'   bad name -> 'mobAreaId'
  'cell'   bad name -> 'mobCellId'
  'unit'   bad name -? 'mobSubUnitId'

Come on! 'net', 'area', 'cell', 'unit' have generic meaning in the world that has nothing to do with your API. Put in a little more effort to your naming, please! Save your users some headaches.

All of these seem, from what I can tell, to be code identifiers which, JOINTLY, label the specific radio antennae which is the subject of the data record. Semantically we really have
  'antenneId' : 'mcc'&'net'&'area'&'cell'&'unit'
but here we use five fields instead of one. Fine, but this needs clear documentation. In a JSON API these would properly be jointly in a sub-structure but since this is a flat API we just need clarity in the documentation stating that jointly these identifiers will provide a unique label for each record.

Ideally, these names would all have form 'id...' but English places its adjectives first 'red car' (versus 'voiture rouge' in French or 'coche rojo' in Spanish) so we end up with a structure '...Id'. My proposed shared prefix 'mob' helps clarify these fields are all similar and work jointly.




  'radio'  bad name -> 'radioClass' or some such

The current MozLocService item upload has a similar crappy naming approach where each item has a 'radio' element but then each observed cell in the item also has its own 'radio' element, of course with different data. So I have to have this ridiculous lookup object:
var CELL_TYPE_LOOKUP = {
    'type':   ['cell.radio', 'item.radio'],//Header field

    'gsm':    ['gsm',      'gsm'], //1G GSM
    'edge':   ['gsm',      'gsm'], //2G EDGE
    'gprs':   ['gsm',      'gsm'], //2G GPRS
    'umts':   ['umts',     'gsm'], //3G UMTS
    'hspa':   ['umts',     'gsm'], //3.5G HSDPA
    'hsdpa':  ['umts',     'gsm'], //3.5G HSDPA
    'hspa+':  ['umts',     'gsm'], //3.5G HSDP+
    'hsupa':  ['umts',     'gsm'], //3.5G HSDPA

    'cdma':   ['cdma',     'cdma'], //1G CDMA
    'is95a':  ['cdma',     'cdma'], //2G CDMA
    'is95b':  ['cdma',     'cdma'], //2G CDMA
    '1xrtt':  ['cdma',     'cdma'], //2G CDMA
    'evdo0':  ['cdma',     'cdma'], //3G CDMA
    'evdoa':  ['cdma',     'cdma'], //3G CDMA
    'evdob':  ['cdma',     'cdma'], //3G CDMA
    'ehrpd':  ['cdma',     'cdma'], //4G CDMA

    'lte':    ['lte',      'gsm']   //4G LTE
}
to generate what is required. I take it this proposed API element is the middle column. I have taken to naming the first column 'radioType,' the second 'radioClass', and the third 'radioFamily' but these names are arbitrary. First you need to decide on your name and then you need a bunch more documentation providing essentially this lookup table to explain this to users.





  'lon'
  'lat'

The documentation should mention the Coordinate Reference System for these as being the CRS used by the GPS system, i.e. WGS84. "The prime meridian is 0 degrees" is a tautology---that's what 'prime meridian' means. More properly, this could be "The Prime Meridian (with value 0 degrees) is the IERS Reference Meridian, close to, but not the same as, the Greenwich Airy Meridian."
    https://en.wikipedia.org/wiki/World_Geodetic_System
    http://spatialreference.org/ref/epsg/4326/


  'changeable' terrible name

As far as I can tell, this only applies to the location of the antennae so the name needs to be linked to the position. From the consumer stand point, the only thing interesting is how the position has been 'determined': either defined or estimated, and if the latter probably the user wants some notion of how it was estimated. This could be done in a single field or in two, depending on what you want

               -> 'posEstimationMethod' DEFINED || CENTROID || ALGO_6
or
               -> 'posDetermination'    DEFINED || ESTIMATED
               -> 'posEstimationAlgo'   MEASURED|| CENTRIOID || ...


The best way, given the variety of algorithms possible, would be to define a few and then use an HTTP URI (i.e. an URL) for the rest where the link is to a web page with the description of the estimation algorithm or process. Otherwise the documentation needs some indication of how the position estimation were derived.

Are you punting completely on giving any estimate of the accuracy of the position? I would expect a

  'posAccuracy'

giving a 95% CI radius around the observation since that is the crucial factor which makes the position usable or not. (The only other element of your API that would let me guess as to the quality of the data would be the number of observations but this does not let me know if they were all in a line or were well distributed spatially.) Since the service, which has all the data, is the only one who can properly make this estimate, it seems this should be generated for each record.




  'range'  bad name -? 'rangeEstimate'

Conceptually, this is an estimate of the distance at which the signal level drops below some particular strength, perhaps usable strength. So the documentation should explain that. Of course, for different radio technologies the threshold strength is probably different, so what is this really? Is this a property of the radioClass or is this an estimate based on the observations?


  'samples' bad name ->  'obsNumber' or 'numObs' or 'numSamples'

The name 'samples' suggests it is the samples themselves but it is actually a number. The text says it is the number of observations used to determine the position but we have already seen the position might have been defined. So the documentation needs to be clear what other entries are based on these observations: i.e. the 'range' or 'averageSignal'.


  'averageSignal' -> !?

Ouch. Hmm. What is this telling us about? Is this to help us estimate the quality of the observations or to help us estimate the quality of the position estimate? 'Max', 'Median', and 'Min' might help with the former; some kind of referent of 'MaxEverForRadioClass' and 'MinEverForRadioClass' in the documentation would be needed for the latter. A straight mathematical average for a 2D spatial estimate is crazy problematic to interpret a posteriori so I am really not sure what this is supposed to provide users. Some clarity of the usage of this number and its behviour in the field is needed in the documentation.



  'created'
  'updated'

Are these purely database modification times or are these related to the observations? If the latter, 'firstObserved' and 'lastObserved' would be better names.

Why make it an ambiguous timestamp, when you can make it an unambiguous ISO 8601 Date (e.g. 2014-07-24T12:16:36Z)?





This is not the API I would have expected.

Without one or a few ways to estimate the accuracy of the position, these records are of little use for positioning. Without a richer description of the spatial structure of the observations, like bounding boxes or partial bounding boxes, these records are of little use in defining the quality of the overall database. So we are left with being able to get summary records which neither provide a well defined estimate of position and other values nor provide a rich summary of the data. As it stands, this API encourages direct, uncritical use of the positions; since OpenCellId estimates several antennae as being in the middle of the ocean, this is not great.

Have you developed a set of usage examples for this API? Are those written up some where? What is the goal of such usages? I have a difficult time guessing as to the motivations which led to such an API.

~adrian
_______________________________________________
dev-geolocation mailing list
[email protected]
https://lists.mozilla.org/listinfo/dev-geolocation

Reply via email to