[I'm going to break my rule of not posting to the mailing lists for this,
because it's an interesting query and important for OSM. Since I started
writing this, Robert has made an excellent posting which covers much of the
same ground and comes to related conclusions, but from a slightly different
angle]

Alex Barth wrote:
> I just updated the Wiki with a proposed community guideline
> on geocoding.
>
> In a nutshell: geocoding with OSM data yields Produced Work, 
> share alike does not apply to Produced Work, other ODbL 
> stipulations such as attribution do apply. The goal is to 
> remove all uncertainties around geocoding

This is an interesting approach and one that I think has potential, but
perhaps in a different form from that proposed.

I find it difficult to square geocoding results always being a Produced Work
with the text of the ODbL. In the medium term, we want complete address data
in major urban areas (those of highest commercial demand). This suggests
address data on every building object in the city.

Geocoding street addresses against such a dataset (the 90% case) is
essentially a clever lookup function: it is extracting raw OSM data (lat/lon
pairs) from the database via a query, and then not doing any significant
transformative work on the lat/lon pairs. That is ODbL's definition of a
Derivative Database, reinforced by the option in 4.6.b of "an algorithm...
that make[s] up all the differences between the Database and the Derivative
Database".

As Randy has alluded, geocoders are powerful tools which put much effort
into providing reliable results. To argue that this effort results in a
Produced Work, you would have to:

- agree that a collection of lat/lon pairs (the result of geocoding) is
analogous with the creative-world examples of "image, audiovisual material,
text, or sounds", and
- agree that this holds true for a significant majority of geocoding
results, particularly with reference to that data which is likely to be
extracted (i.e. San Francisco more likely to be extracted than deepest
Idaho)

To me, those statements seem like a leap beyond what OSMF and the OSM
community would be comfortable to take right now.


However, despite this, I think the Produced Work angle is potentially a
promising avenue towards "removing all uncertainties around geocoding".

Instead of a blanket and potentially problematic statement that "geocoding
with OSM data yields Produced Work", we should focus on the next level down.
In other words, accept that data extracted from OSM by means of address
queries remains ODbL-licensed OSM data: but then look at what is done with
this data (how it is "used"), and whether this might be a Produced Work or a
Collective Database.

In particular, I would throw into the mix what Matt generously called the
Fairhurst Doctrine
(https://lists.openstreetmap.org/pipermail/legal-talk/2009-October/002881.html,
https://lists.openstreetmap.org/pipermail/legal-talk/2009-October/002911.html).
This argues that if you match ODbL data against third-party data by means of
a simple query, the table mapping ids from one to the other is not
qualitatively substantial: therefore the two datasets become a Collective
Database, in which the third-party data can be licensed any which way.

So let's try this with one of Alex's examples: the first one, in which "the
store locations are being exposed to the public on a store locator map using
Bing maps".

If you reference the store addresses against OSM address data, following the
Fairhurst Doctrine, the result is a Collective Database: the address data in
OSM in one database, the store data in another, and a simple mapping between
the two (imagine it as a separate table for now). Therefore the store data
is not subject to ODbL.

There is one major question in this: whether geocoding is just "a simple
query", or whether it's something big and difficult and complicated. The
latter is just another way of saying "qualitatively substantial", which
would mean that the table mapping ids between the databases becomes
derivative, and the result can't be a Collective Database.

Again, this would be up to OSMF to decide in consultation with the
community. I'd personally argue that, more often than not, it's a "simple
query". I don't mean any disrespect to Sarah and Brian, or Randy's geocoding
experts, or any of the other people working on geocoders. A 100% geocoder is
undoubtedly a ball of hurt. But taking the urban street example from above,
which are likely to be the majority of the queries thrown at a geocoder, it
remains at heart a predictable algorithmic translation of OSM data. Edge
cases are the hard part, and there are plenty of them, but edge cases are by
definition not substantial.

By looking at the "use", and considering whether it counts as a Collective
Database or a Produced Work, we should be able to come up with clear answers
for all of the common geocoding use cases. Yes, there'll always be some
scenarios where it could go either way: that's inevitable when lawyers are
involved. But that shouldn't be more than an absolute minimum, and most
importantly, it's something that should stand up against the letter and
intent of the licence we all signed up to.

Richard





--
View this message in context: 
http://gis.19327.n5.nabble.com/OSM-legal-talk-Updated-geocoding-community-guideline-proposal-tp5811077p5811521.html
Sent from the Legal Talk mailing list archive at Nabble.com.

_______________________________________________
legal-talk mailing list
legal-talk@openstreetmap.org
https://lists.openstreetmap.org/listinfo/legal-talk

Reply via email to