Re: [Dbpedia-discussion] URIs vs. other IDs (Was: New user interface for dbpedia.org)

Kingsley Idehen Fri, 13 Feb 2015 11:39:18 -0800

On 2/8/15 11:28 AM, Kingsley Idehen wrote:

On 2/7/15 6:07 PM, Markus Kroetzsch wrote:
Hi Kingsley,
We are getting a bit off-topic here, but let me answer briefly ...

On 07.02.2015 21:36, Kingsley Idehen wrote:
...
Not it isn't duplication. Wikipedia HTTP URLs identify Wikipedia
documents. DBpedia URIs identify entities associated with Wikipedia
documents. There's a world of difference here!
That's not my point (I know the difference, of course). Wikidata stores neither Wikipedia URLs nor DBpedia URIs. It just stores Wikipedia article names together with Wikimedia site (project) identifiers. The work to get from there to the URL is the same as the work to get to the URI. Storing either explicitly in another property value would only introduce redundancy (and potential inconsistencies). In a Linked Data export you could easily include one or both of these URIs, depending on the application, but it's not so clear that doing this in a data viewer would make much sense. Surely it would not be useful if people would have to enter all of this data manually three times.
On that note, is it the current best practice that all linked data exports include links to all other datasets that contain related information (exhaustive two-way linking)? That seems like a lot of triples and not very feasible if the LOD Web grows (a bit like two-way HTML linking ... ;-). Wouldn't it be more practical to integrate via shared key values? In this case, Wikipedia URLs might be a sensible choice to indicate the topic of a resource, rather than requiring all resources that have a Wikipedia article as their topic to cross link to all (quadratically many) other such resources directly. I would be curious to hear your take on this.
There are similar issues with most of the other identifiers: they are
usually the main IDs of the database, not the URIs of the
corresponding RDF data (if available).
Hmm.. if you look at the identifiers on the viewer's right hand side,
you will find out (depending on you understanding of Linked Open Data
concepts) that they too identify entities that are associated with Web
pages, rather than web pages themselves.
Sure, but you are confusing the purpose of URIs with the underlying technical standard here. People use identifiers to refer to entities, or course, yet they do not use identifiers that are based on the URI standard. We both know about the limitations of this approach, but that does not change the shape of the IDs people use to refer to things (e.g., on Freebase, but it is the same elsewhere). Usually, if you want to interface with such data collections (be it via UIs or via APIs), you need to use their official IDs, while URIs are not supported.
This is also the answer to your other comment. You are only seeing the purpose of the identifier, and you rightly say that there should be no big technical issue to use a URI instead. I agree, yet it has to be done, and it has to be done differently for each case. There is no general rule how to construct URIs from the official IDs used by open data collections on today's Web.
A related "problem" is that most online data sets have UIs that are much more user friendly than any LOD browser could be based on the RDF they export. There is no incentive for users to click on a LOD-based view of, say, IMDB, if they can just go to the IMDB page instead. This should be taken into account when building a DBpedia LOD view (back on topic! ;-): people who want to learn about something will usually be better served by going to Wikipedia; the target audience of the viewer is probably a different group who wants to inspect the DBpedia data set. This should probably affect how the UI is built, and maybe will lead to different design decisions than in the Wikidata browser I mentioned.
Markus
Markus,
Cutting a long story real short. Yes, you have industry standard identifiers, ditto HTTP URI that identify things in regards to Linked Open Data principles. You simply use relations such as dcterms:identifier (and the like) to incorporate industry standard identifiers into an entity description. Even better, those relations should be inverse-functional in nature. That's really it.
DBpedia Identifiers (HTTP URI based References) and Industry Standard Identifiers (typically literal in nature) aren't mutually exclusive.
Getting back on topic, reasonator is a nice UI. What it lacks, from a DBpedia perspective, is incorporation of DBpedia URIs which is an issue the author of the tool assured me he will be addressing, as a high priority.

Follow-up in regards to the above, our biggest concern boils down to dealing with the following challenges, which highly impact UI and UX:

1. replacing URIs with object of certain annotation oriented relations (rdfs:label, skos:prefLabel, skos:altLabel etc..) 2. page results -- in situations where the number of relations associated with an entity description is very large i.e., where entity being described is the subject or object or many relations

3. addressing the above without destroying performance and scalability.

Labels for URIs:

In-built functionality in Virtuoso enables you to address the challenges above using inference and reasoning. We've used this for 6 years, and we know it scales up to 61 Billion+ triples. The more relations you add to the ontology that's being used as the inference rules basis, the more sophisticated the labeling output you end up with in return.


Paging:

Virtuoso can present a client with pages of relations in which the entity being described is a relation subject or object. This is a major challenge as datasets grow.


Example:

[1] http://lod.openlinksw.com/c/IJWDPD3 -- labels for URIs and results paging LOD Cloud cache (which has 61 Billion+ RDF statements).

[2] http://dbpedia.org/c/8BADDPR -- DBpedia .

--
Regards,

Kingsley Idehen 
Founder & CEO
OpenLink Software
Company Web: http://www.openlinksw.com
Personal Weblog 1: http://kidehen.blogspot.com
Personal Weblog 2: http://www.openlinksw.com/blog/~kidehen
Twitter Profile: https://twitter.com/kidehen
Google+ Profile: https://plus.google.com/+KingsleyIdehen/about
LinkedIn Profile: http://www.linkedin.com/in/kidehen
Personal WebID: http://kingsley.idehen.net/dataspace/person/kidehen#this

smime.p7s
Description: S/MIME Cryptographic Signature

------------------------------------------------------------------------------
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net/

_______________________________________________
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Re: [Dbpedia-discussion] URIs vs. other IDs (Was: New user interface for dbpedia.org)

Reply via email to