Great questions. I've attempted to answer a few of them below:

On 03/10/2017 09:56, Christoph Hormann wrote:
* To what extent has there been information transferred systematically
from Wikidata and Wikipedia to OSM based on wikidata ID references
(like adding names in different languages).  As others have explained
this would be legally problematic and it would be important to know how
common this is.

I agree that there are questions about OSM's acceptance of labels and statements copied from Wikidata, though I would've expected this phenomenon to be at least as common with Wikipedia long before the introduction of the wikidata tag.

Years ago, there was a campaign to add as many translations of country names as possible, using Wikipedia as the primary source. [1] A map renderer that uses these translations would logically want translations or transliterations for as many cities as possible, but my impression is that the OSM community would frown on such a massive expansion in city name transliterations. Instead, we can point data consumers to Wikidata as a source for this data.

* How stable is the identity of what can be found under a certain
Wikidata ID.  As mentioned there are cases where Wikidata aggregates
several concepts under one ID (like an administrative unit and a
populated place in case of cities/towns).  Would it be possible that
this changes?  If yes, would the original ID be re-purposed or would it
cease to exist?

To the extent that an administrative unit and populated place are considered separate entities, as they are for some kinds of places, Wikidata ideally maintains separate entities for each. The reality is less clear-cut, since much of Wikidata's original data on geographic and political entities comes from Wikipedia, which generally doesn't make such distinctions at the article title level. The Wikidata project aims to eventually create separate entities for every concept that Wikipedia has traditionally conflated inside the same article. Thus Wikidata maintains a separate entity for each Pokémon species, whereas the English Wikipedia combines them all into a few list articles. [2][3]

If an administrative unit or populated place (or both) ceases to exist, the QID remains valid, but a statement or qualifier is added to indicate "former" status, much like OSM's lifecycle tags (disused etc.). An entity may be redirected under some circumstances. For example, if the Wikidata community discovers that two entities are duplicates, referring to exactly the same concept, an editor will manually blank one in favor of the other, and a bot will create a redirect automatically. [4]

Many of the duplicate entities were created as a result of incorrect linking between Wikipedia article translations at the time Wikipedia article titles were being imported into Wikidata. If someone had translated the article "Pumpkin" from English to Pennsylvania German but neglected to link the English article to the Pennsylvania German one, Wikidata might've wound up with two entities, one linking to many languages including English, the other linking to only Pennsylvania German. Most likely the latter entity would end up redirecting to the former.

The English Wikipedia sees a couple dozen geographical articles renamed each day. [5] This is a rough estimate based on articles tagged with geographical coordinates. I don't know how many of these articles are the target of wikipedia tags in OSM -- I think that would require Yuri's SPARQL tool.

But the important thing to note is that a redirect on Wikipedia may not remain a redirect for long: editors may decide to repurpose the redirect page for a disambiguation page or perhaps an article on a subtlely different topic. If that happens, an OSM data consumer would have to trawl through article history to determine which article each wikipedia tag really meant to refer to. By comparison, since integers are cheap, Wikidata entities don't tend to get repurposed the way Wikipedia article titles do, so even a stale QID can be traced to relevant data pretty easily.

* What is the qualification of Wikidata for having its IDs in OSM (both
for wikidata=* and X:wikidata=*)?  Is there a particular objective
criterion that qualifies it?  Would there be other external IDs that
would also qualify under these criteria?  Is there a limit in the
number of different external IDs OSM is going to accept?

There are at least several other kinds of IDs that have been added in large numbers in the past. Off the top of my head, there are the various ref schemes used in conjunction with the heritage tag, GNIS feature IDs associated with an import of POIs in the U.S., and of course regulatory IDs such as ICAO/IATA.

Far from opening the floodgates to external IDs, Wikidata gives us the ability to limit external ID tagging. Consider that Wikidata lists seven different external identifiers for Hamilton County, Ohio, United States. [6] If someone ever proposes to tag U.S. counties with FIPS or GeoNames codes, we can point out that the feature is tagged with a Wikidata QID and the Wikidata entity is tagged with FIPS and GeoNames codes, making additional OSM tagging unnecessary. So we can consider Wikidata to be a meta external database, yet we still have the flexibility to bring in other external IDs if that's what the community decides to do.

Also i think it would be of great importance for OSM and a functioning
communication in the community to have better documentation of:

* systematic wikidata ID addition/editing efforts (there seems to be
nothing listed currently on
https://wiki.openstreetmap.org/wiki/Category:Automated_edits_log)
* tag documentation of the wikidata tags.  This needs a lot of
improvement.  Like:

https://wiki.openstreetmap.org/wiki/Key:wikidata does not make clear if
these document 1:1 relationships between OSM features and wikidata
objects or not and what qualifies a wikidata ID to be 'about a
feature'.  How does a mapper practically verify if a certain wikidata
ID is correct on a certain feature?

I agree, finding the most effective way to explain these relationships in documentation will be an ongoing effort for some time. One problem I commonly see in existing wikidata=* mapping is that, for example, all the locations of a restaurant chain are given the same wikidata tag. wikidata=* is designed to be a 1:1 relationship, at least for POIs and routes. (I suppose a company may have more than one headquarters, though.) Tags like brand:wikidata=* were derived to promote the wikidata tag's 1:1 relationship.

As for the practicality of verifying wikidata tags, I think it's important for editors to fetch and display the label beside the QID whenever it's displayed. Perhaps also the description or "is a" statement.

https://wiki.openstreetmap.org/wiki/Key:brand:wikidata is plain wrong,
brand:wikidata=* is not a machine-readable form of brand=*.  It in
particular needs to tell the mapper what types of wikidata object
should be referenced here and how a mapper can find the correct ID for
a certain feature.

I suppose that was a rhetorical flourish on my part. What would be the best way to describe the role played by the brand:wikidata value in this hypothetical example:

name=Terminal 1 KFC
brand=KFC
brand:wikidata=Q524757
operator=ACME Airport Concessions

where Q524757 is the Wikidata entry for KFC the fast food chain? To find Q524757, I went to https://en.wikipedia.org/wiki/KFC and clicked on "Wikidata item". This is what I would've done to find the QID of the London Eye to put in the London Eye's wikidata tag, if not for iD's basic Wikidata integration.

[1] http://web.archive.org/web/20121216044005/http://toolserver.org:80/~mazder/multilingual-country-list/
[2] https://www.wikidata.org/wiki/Q1647331
[3] https://en.wikipedia.org/wiki/List_of_generation_I_Pokémon#Raichu
[4] https://www.wikidata.org/wiki/Help:Merge
[5] https://quarry.wmflabs.org/query/22125
[6] https://www.wikidata.org/wiki/Q152891#identifiers

--
m...@nguyen.cincinnati.oh.us


_______________________________________________
talk mailing list
talk@openstreetmap.org
https://lists.openstreetmap.org/listinfo/talk

Reply via email to