[I'm going to break my rule of not posting to the mailing lists for this, because it's an interesting query and important for OSM. Since I started writing this, Robert has made an excellent posting which covers much of the same ground and comes to related conclusions, but from a slightly different angle]
Alex Barth wrote: > I just updated the Wiki with a proposed community guideline > on geocoding. > > In a nutshell: geocoding with OSM data yields Produced Work, > share alike does not apply to Produced Work, other ODbL > stipulations such as attribution do apply. The goal is to > remove all uncertainties around geocoding This is an interesting approach and one that I think has potential, but perhaps in a different form from that proposed. I find it difficult to square geocoding results always being a Produced Work with the text of the ODbL. In the medium term, we want complete address data in major urban areas (those of highest commercial demand). This suggests address data on every building object in the city. Geocoding street addresses against such a dataset (the 90% case) is essentially a clever lookup function: it is extracting raw OSM data (lat/lon pairs) from the database via a query, and then not doing any significant transformative work on the lat/lon pairs. That is ODbL's definition of a Derivative Database, reinforced by the option in 4.6.b of "an algorithm... that make[s] up all the differences between the Database and the Derivative Database". As Randy has alluded, geocoders are powerful tools which put much effort into providing reliable results. To argue that this effort results in a Produced Work, you would have to: - agree that a collection of lat/lon pairs (the result of geocoding) is analogous with the creative-world examples of "image, audiovisual material, text, or sounds", and - agree that this holds true for a significant majority of geocoding results, particularly with reference to that data which is likely to be extracted (i.e. San Francisco more likely to be extracted than deepest Idaho) To me, those statements seem like a leap beyond what OSMF and the OSM community would be comfortable to take right now. However, despite this, I think the Produced Work angle is potentially a promising avenue towards "removing all uncertainties around geocoding". Instead of a blanket and potentially problematic statement that "geocoding with OSM data yields Produced Work", we should focus on the next level down. In other words, accept that data extracted from OSM by means of address queries remains ODbL-licensed OSM data: but then look at what is done with this data (how it is "used"), and whether this might be a Produced Work or a Collective Database. In particular, I would throw into the mix what Matt generously called the Fairhurst Doctrine (https://lists.openstreetmap.org/pipermail/legal-talk/2009-October/002881.html, https://lists.openstreetmap.org/pipermail/legal-talk/2009-October/002911.html). This argues that if you match ODbL data against third-party data by means of a simple query, the table mapping ids from one to the other is not qualitatively substantial: therefore the two datasets become a Collective Database, in which the third-party data can be licensed any which way. So let's try this with one of Alex's examples: the first one, in which "the store locations are being exposed to the public on a store locator map using Bing maps". If you reference the store addresses against OSM address data, following the Fairhurst Doctrine, the result is a Collective Database: the address data in OSM in one database, the store data in another, and a simple mapping between the two (imagine it as a separate table for now). Therefore the store data is not subject to ODbL. There is one major question in this: whether geocoding is just "a simple query", or whether it's something big and difficult and complicated. The latter is just another way of saying "qualitatively substantial", which would mean that the table mapping ids between the databases becomes derivative, and the result can't be a Collective Database. Again, this would be up to OSMF to decide in consultation with the community. I'd personally argue that, more often than not, it's a "simple query". I don't mean any disrespect to Sarah and Brian, or Randy's geocoding experts, or any of the other people working on geocoders. A 100% geocoder is undoubtedly a ball of hurt. But taking the urban street example from above, which are likely to be the majority of the queries thrown at a geocoder, it remains at heart a predictable algorithmic translation of OSM data. Edge cases are the hard part, and there are plenty of them, but edge cases are by definition not substantial. By looking at the "use", and considering whether it counts as a Collective Database or a Produced Work, we should be able to come up with clear answers for all of the common geocoding use cases. Yes, there'll always be some scenarios where it could go either way: that's inevitable when lawyers are involved. But that shouldn't be more than an absolute minimum, and most importantly, it's something that should stand up against the letter and intent of the licence we all signed up to. Richard -- View this message in context: http://gis.19327.n5.nabble.com/OSM-legal-talk-Updated-geocoding-community-guideline-proposal-tp5811077p5811521.html Sent from the Legal Talk mailing list archive at Nabble.com. _______________________________________________ legal-talk mailing list legal-talk@openstreetmap.org https://lists.openstreetmap.org/listinfo/legal-talk