On 26. August 2014 20:09:19 MESZ, Adrian Custer <[email protected]> wrote: >On 8/26/14 11:14 AM, Hanno Schlichting wrote: >> Hi. >> >> It’s been a long time in the planning, but we are finally getting >> closer to actually making the aggregated cell network data available >> for download. >> >> We have worked with the OpenCellID project to agree on a new shared >> export format, to make it easier for anyone using either of our two >> data sources. The details of the new data format are documented at >> http://mozilla-ichnaea.readthedocs.org/en/latest/import_export.html. >> >> As a concrete example, I’ve exported some of the most recent cell >> networks from our live database. You can get the sample at: >> >https://www.dropbox.com/s/vcmjuozhv0fjpmm/MLS-diff-cell-export-2014-08-26T130000.csv.gz?dl=0 >> >> The data is licensed under CC-0 terms, so has neither copyright nor >> database right restrictions >> (https://creativecommons.org/publicdomain/zero/1.0/). >> >> If you haven’t followed the github issues about this topic, now’s >> your time to share feedback and concerns. >> >> If all goes well, we are looking at adding a public downloads section >> to the website next Tuesday and making all of our cell network data >> available. >> >> Best, Hanno _______________________________________________ >> dev-geolocation mailing list [email protected] >> https://lists.mozilla.org/listinfo/dev-geolocation >> > >Hello all, > >Hanno's email was timely since I was about to send a mail asking about >this API effort. I do have "Feedback and Concerns"; here are some. > > > >At the process level, a week before launch is *very* late to be asking >for feedback on a public API! The GeoLocation Web API and the Mozilla >Location Service upload API both would have benefited from some good, >structured logic to avoid their bad structure and naming. I guess the >discussion on this API was all happening on GitHub issues and not on >this list. For something 'a long time in the planning' though, this >public notice is sadly late. > > > >The Internet end of line separator is [carriage-return, line feed] as >per all IETF standards. > The Text/Plain media type is the lowest common denominator of > Internet email, with lines of no more than 998 characters (by > convention usually no more than 78), and where the carriage-return > and line-feed (CRLF) sequence represents a line break (see [MIME-IMT] > and [MSG-FMT]). > http://tools.ietf.org/html/rfc3676 >This started with email, was kept for HTTP, and, absent strong reasons >to change it, sticking with the standard is best policy. > > > >The proposed API needs work: the semantics are a mishmash and the >naming >is terrible. The page name 'import-export' is in direct conflict with >the API structure which appears to only have been thought out for >export. Here I only discuss export because developing an API that can >serve both will take way more time and we might as well start >somewhere. >Nonetheless, the clearer the name and documentation, the more reusable >the element. > > >Semantically, the API is offering a set of individual data records each > >of which consists of: > *a set of labels which jointly identify an individual antennae > *known properties of the antennae > *measurements of that antennae > *estimated properties of that antennae > *record metadata >Unfortunately, neither the names nor the documentations properly >separate out these roles. > > >Let's walk the proposal: > > > > 'mcc' okay but -> 'mobCountryId' to match others > 'net' bad name -> 'mobProviderId' > 'area' bad name -> 'mobAreaId' > 'cell' bad name -> 'mobCellId' > 'unit' bad name -? 'mobSubUnitId' > >Come on! 'net', 'area', 'cell', 'unit' have generic meaning in the >world >that has nothing to do with your API. Put in a little more effort to >your naming, please! Save your users some headaches. > >All of these seem, from what I can tell, to be code identifiers which, >JOINTLY, label the specific radio antennae which is the subject of the >data record. Semantically we really have > 'antenneId' : 'mcc'&'net'&'area'&'cell'&'unit' >but here we use five fields instead of one. Fine, but this needs clear >documentation. In a JSON API these would properly be jointly in a >sub-structure but since this is a flat API we just need clarity in the >documentation stating that jointly these identifiers will provide a >unique label for each record. > >Ideally, these names would all have form 'id...' but English places its > >adjectives first 'red car' (versus 'voiture rouge' in French or 'coche >rojo' in Spanish) so we end up with a structure '...Id'. My proposed >shared prefix 'mob' helps clarify these fields are all similar and work > >jointly. > > > > > 'radio' bad name -> 'radioClass' or some such > >The current MozLocService item upload has a similar crappy naming >approach where each item has a 'radio' element but then each observed >cell in the item also has its own 'radio' element, of course with >different data. So I have to have this ridiculous lookup object: >var CELL_TYPE_LOOKUP = { > 'type': ['cell.radio', 'item.radio'],//Header field > > 'gsm': ['gsm', 'gsm'], //1G GSM > 'edge': ['gsm', 'gsm'], //2G EDGE > 'gprs': ['gsm', 'gsm'], //2G GPRS > 'umts': ['umts', 'gsm'], //3G UMTS > 'hspa': ['umts', 'gsm'], //3.5G HSDPA > 'hsdpa': ['umts', 'gsm'], //3.5G HSDPA > 'hspa+': ['umts', 'gsm'], //3.5G HSDP+ > 'hsupa': ['umts', 'gsm'], //3.5G HSDPA > > 'cdma': ['cdma', 'cdma'], //1G CDMA > 'is95a': ['cdma', 'cdma'], //2G CDMA > 'is95b': ['cdma', 'cdma'], //2G CDMA > '1xrtt': ['cdma', 'cdma'], //2G CDMA > 'evdo0': ['cdma', 'cdma'], //3G CDMA > 'evdoa': ['cdma', 'cdma'], //3G CDMA > 'evdob': ['cdma', 'cdma'], //3G CDMA > 'ehrpd': ['cdma', 'cdma'], //4G CDMA > > 'lte': ['lte', 'gsm'] //4G LTE >} >to generate what is required. I take it this proposed API element is >the >middle column. I have taken to naming the first column 'radioType,' the > >second 'radioClass', and the third 'radioFamily' but these names are >arbitrary. First you need to decide on your name and then you need a >bunch more documentation providing essentially this lookup table to >explain this to users. > > > > > > 'lon' > 'lat' > >The documentation should mention the Coordinate Reference System for >these as being the CRS used by the GPS system, i.e. WGS84. "The prime >meridian is 0 degrees" is a tautology---that's what 'prime meridian' >means. More properly, this could be "The Prime Meridian (with value 0 >degrees) is the IERS Reference Meridian, close to, but not the same as, > >the Greenwich Airy Meridian." > https://en.wikipedia.org/wiki/World_Geodetic_System > http://spatialreference.org/ref/epsg/4326/ > > > 'changeable' terrible name > >As far as I can tell, this only applies to the location of the antennae > >so the name needs to be linked to the position. From the consumer stand > >point, the only thing interesting is how the position has been >'determined': either defined or estimated, and if the latter probably >the user wants some notion of how it was estimated. This could be done >in a single field or in two, depending on what you want > > -> 'posEstimationMethod' DEFINED || CENTROID || ALGO_6 >or > -> 'posDetermination' DEFINED || ESTIMATED > -> 'posEstimationAlgo' MEASURED|| CENTRIOID || ... > > >The best way, given the variety of algorithms possible, would be to >define a few and then use an HTTP URI (i.e. an URL) for the rest where >the link is to a web page with the description of the estimation >algorithm or process. Otherwise the documentation needs some indication > >of how the position estimation were derived. > >Are you punting completely on giving any estimate of the accuracy of >the >position? I would expect a > > 'posAccuracy' > >giving a 95% CI radius around the observation since that is the crucial > >factor which makes the position usable or not. (The only other element >of your API that would let me guess as to the quality of the data would > >be the number of observations but this does not let me know if they >were >all in a line or were well distributed spatially.) Since the service, >which has all the data, is the only one who can properly make this >estimate, it seems this should be generated for each record. > > > > > 'range' bad name -? 'rangeEstimate' > >Conceptually, this is an estimate of the distance at which the signal >level drops below some particular strength, perhaps usable strength. So > >the documentation should explain that. Of course, for different radio >technologies the threshold strength is probably different, so what is >this really? Is this a property of the radioClass or is this an >estimate >based on the observations? > > > 'samples' bad name -> 'obsNumber' or 'numObs' or 'numSamples' > >The name 'samples' suggests it is the samples themselves but it is >actually a number. The text says it is the number of observations used >to determine the position but we have already seen the position might >have been defined. So the documentation needs to be clear what other >entries are based on these observations: i.e. the 'range' or >'averageSignal'. > > > 'averageSignal' -> !? > >Ouch. Hmm. What is this telling us about? Is this to help us estimate >the quality of the observations or to help us estimate the quality of >the position estimate? 'Max', 'Median', and 'Min' might help with the >former; some kind of referent of 'MaxEverForRadioClass' and >'MinEverForRadioClass' in the documentation would be needed for the >latter. A straight mathematical average for a 2D spatial estimate is >crazy problematic to interpret a posteriori so I am really not sure >what >this is supposed to provide users. Some clarity of the usage of this >number and its behviour in the field is needed in the documentation. > > > > 'created' > 'updated' > >Are these purely database modification times or are these related to >the >observations? If the latter, 'firstObserved' and 'lastObserved' would >be >better names. > >Why make it an ambiguous timestamp, when you can make it an unambiguous > >ISO 8601 Date (e.g. 2014-07-24T12:16:36Z)? > > > > > >This is not the API I would have expected. > >Without one or a few ways to estimate the accuracy of the position, >these records are of little use for positioning. Without a richer >description of the spatial structure of the observations, like bounding > >boxes or partial bounding boxes, these records are of little use in >defining the quality of the overall database. So we are left with being > >able to get summary records which neither provide a well defined >estimate of position and other values nor provide a rich summary of the > >data. As it stands, this API encourages direct, uncritical use of the >positions; since OpenCellId estimates several antennae as being in the >middle of the ocean, this is not great. > >Have you developed a set of usage examples for this API? Are those >written up some where? What is the goal of such usages? I have a >difficult time guessing as to the motivations which led to such an API. > >~adrian >_______________________________________________ >dev-geolocation mailing list >[email protected] >https://lists.mozilla.org/listinfo/dev-geolocation
To point this out further: We need a column that tells us whether the cell's signal is radial or whether it covers only a sector For that sector we would need an angle and a direction (as an angel). Regards, Felix _______________________________________________ dev-geolocation mailing list [email protected] https://lists.mozilla.org/listinfo/dev-geolocation
