Re: [Dbpedia-discussion] URIs vs. other IDs (Was: New user interface for dbpedia.org)

2015-02-13 Thread Kingsley Idehen

On 2/8/15 11:28 AM, Kingsley Idehen wrote:

On 2/7/15 6:07 PM, Markus Kroetzsch wrote:

Hi Kingsley,

We are getting a bit off-topic here, but let me answer briefly ...

On 07.02.2015 21:36, Kingsley Idehen wrote:
...


Not it isn't duplication. Wikipedia HTTP URLs identify Wikipedia
documents. DBpedia URIs identify entities associated with Wikipedia
documents. There's a world of difference here!


That's not my point (I know the difference, of course). Wikidata 
stores neither Wikipedia URLs nor DBpedia URIs. It just stores 
Wikipedia article names together with Wikimedia site (project) 
identifiers. The work to get from there to the URL is the same as the 
work to get to the URI. Storing either explicitly in another property 
value would only introduce redundancy (and potential 
inconsistencies). In a Linked Data export you could easily include 
one or both of these URIs, depending on the application, but it's not 
so clear that doing this in a data viewer would make much sense. 
Surely it would not be useful if people would have to enter all of 
this data manually three times.


On that note, is it the current best practice that all linked data 
exports include links to all other datasets that contain related 
information (exhaustive two-way linking)? That seems like a lot of 
triples and not very feasible if the LOD Web grows (a bit like 
two-way HTML linking ... ;-). Wouldn't it be more practical to 
integrate via shared key values? In this case, Wikipedia URLs might 
be a sensible choice to indicate the topic of a resource, rather than 
requiring all resources that have a Wikipedia article as their topic 
to cross link to all (quadratically many) other such resources 
directly. I would be curious to hear your take on this.






There are similar issues with most of the other identifiers: they are
usually the main IDs of the database, not the URIs of the
corresponding RDF data (if available).


Hmm.. if you look at the identifiers on the viewer's right hand side,
you will find out (depending on you understanding of Linked Open Data
concepts) that they too identify entities that are associated with Web
pages, rather than web pages themselves.


Sure, but you are confusing the purpose of URIs with the underlying 
technical standard here. People use identifiers to refer to entities, 
or course, yet they do not use identifiers that are based on the URI 
standard. We both know about the limitations of this approach, but 
that does not change the shape of the IDs people use to refer to 
things (e.g., on Freebase, but it is the same elsewhere). Usually, if 
you want to interface with such data collections (be it via UIs or 
via APIs), you need to use their official IDs, while URIs are not 
supported.


This is also the answer to your other comment. You are only seeing 
the purpose of the identifier, and you rightly say that there should 
be no big technical issue to use a URI instead. I agree, yet it has 
to be done, and it has to be done differently for each case. There is 
no general rule how to construct URIs from the official IDs used by 
open data collections on today's Web.


A related problem is that most online data sets have UIs that are 
much more user friendly than any LOD browser could be based on the 
RDF they export. There is no incentive for users to click on a 
LOD-based view of, say, IMDB, if they can just go to the IMDB page 
instead. This should be taken into account when building a DBpedia 
LOD view (back on topic! ;-): people who want to learn about 
something will usually be better served by going to Wikipedia; the 
target audience of the viewer is probably a different group who wants 
to inspect the DBpedia data set. This should probably affect how the 
UI is built, and maybe will lead to different design decisions than 
in the Wikidata browser I mentioned.


Markus


Markus,

Cutting a long story real short. Yes, you have industry standard 
identifiers, ditto HTTP URI that identify things in regards to Linked 
Open Data principles.
You simply use relations such as dcterms:identifier (and the like) to 
incorporate industry standard identifiers into an entity description. 
Even better, those relations should be inverse-functional in nature. 
That's really it.


DBpedia Identifiers (HTTP URI based References) and Industry Standard 
Identifiers (typically literal in nature) aren't mutually exclusive.


Getting back on topic, reasonator is a nice UI. What it lacks, from a 
DBpedia perspective, is incorporation of DBpedia URIs which is an 
issue the author of the tool assured me he will be addressing, as a 
high priority.


Follow-up in regards to the above, our biggest concern boils down to 
dealing with the following challenges, which highly impact UI and UX:


1. replacing URIs with object of certain annotation oriented relations 
(rdfs:label, skos:prefLabel, skos:altLabel etc..)
2. page results -- in situations where the number of relations 
associated with an entity description is 

Re: [Dbpedia-discussion] URIs vs. other IDs (Was: New user interface for dbpedia.org)

2015-02-08 Thread Kingsley Idehen

On 2/7/15 6:07 PM, Markus Kroetzsch wrote:

Hi Kingsley,

We are getting a bit off-topic here, but let me answer briefly ...

On 07.02.2015 21:36, Kingsley Idehen wrote:
...


Not it isn't duplication. Wikipedia HTTP URLs identify Wikipedia
documents. DBpedia URIs identify entities associated with Wikipedia
documents. There's a world of difference here!


That's not my point (I know the difference, of course). Wikidata 
stores neither Wikipedia URLs nor DBpedia URIs. It just stores 
Wikipedia article names together with Wikimedia site (project) 
identifiers. The work to get from there to the URL is the same as the 
work to get to the URI. Storing either explicitly in another property 
value would only introduce redundancy (and potential inconsistencies). 
In a Linked Data export you could easily include one or both of these 
URIs, depending on the application, but it's not so clear that doing 
this in a data viewer would make much sense. Surely it would not be 
useful if people would have to enter all of this data manually three 
times.


On that note, is it the current best practice that all linked data 
exports include links to all other datasets that contain related 
information (exhaustive two-way linking)? That seems like a lot of 
triples and not very feasible if the LOD Web grows (a bit like two-way 
HTML linking ... ;-). Wouldn't it be more practical to integrate via 
shared key values? In this case, Wikipedia URLs might be a sensible 
choice to indicate the topic of a resource, rather than requiring all 
resources that have a Wikipedia article as their topic to cross link 
to all (quadratically many) other such resources directly. I would be 
curious to hear your take on this.






There are similar issues with most of the other identifiers: they are
usually the main IDs of the database, not the URIs of the
corresponding RDF data (if available).


Hmm.. if you look at the identifiers on the viewer's right hand side,
you will find out (depending on you understanding of Linked Open Data
concepts) that they too identify entities that are associated with Web
pages, rather than web pages themselves.


Sure, but you are confusing the purpose of URIs with the underlying 
technical standard here. People use identifiers to refer to entities, 
or course, yet they do not use identifiers that are based on the URI 
standard. We both know about the limitations of this approach, but 
that does not change the shape of the IDs people use to refer to 
things (e.g., on Freebase, but it is the same elsewhere). Usually, if 
you want to interface with such data collections (be it via UIs or via 
APIs), you need to use their official IDs, while URIs are not supported.


This is also the answer to your other comment. You are only seeing the 
purpose of the identifier, and you rightly say that there should be no 
big technical issue to use a URI instead. I agree, yet it has to be 
done, and it has to be done differently for each case. There is no 
general rule how to construct URIs from the official IDs used by open 
data collections on today's Web.


A related problem is that most online data sets have UIs that are 
much more user friendly than any LOD browser could be based on the RDF 
they export. There is no incentive for users to click on a LOD-based 
view of, say, IMDB, if they can just go to the IMDB page instead. This 
should be taken into account when building a DBpedia LOD view (back on 
topic! ;-): people who want to learn about something will usually be 
better served by going to Wikipedia; the target audience of the viewer 
is probably a different group who wants to inspect the DBpedia data 
set. This should probably affect how the UI is built, and maybe will 
lead to different design decisions than in the Wikidata browser I 
mentioned.


Markus


Markus,

Cutting a long story real short. Yes, you have industry standard 
identifiers, ditto HTTP URI that identify things in regards to Linked 
Open Data principles.
You simply use relations such as dcterms:identifier (and the like) to 
incorporate industry standard identifiers into an entity description. 
Even better, those relations should be inverse-functional in nature. 
That's really it.


DBpedia Identifiers (HTTP URI based References) and Industry Standard 
Identifiers (typically literal in nature) aren't mutually exclusive.


Getting back on topic, reasonator is a nice UI. What it lacks, from a 
DBpedia perspective, is incorporation of DBpedia URIs which is an issue 
the author of the tool assured me he will be addressing, as a high 
priority.



--
Regards,

Kingsley Idehen 
Founder  CEO
OpenLink Software
Company Web: http://www.openlinksw.com
Personal Weblog 1: http://kidehen.blogspot.com
Personal Weblog 2: http://www.openlinksw.com/blog/~kidehen
Twitter Profile: https://twitter.com/kidehen
Google+ Profile: https://plus.google.com/+KingsleyIdehen/about
LinkedIn Profile: http://www.linkedin.com/in/kidehen
Personal WebID: 

[Dbpedia-discussion] URIs vs. other IDs (Was: New user interface for dbpedia.org)

2015-02-07 Thread Markus Kroetzsch
Hi Kingsley,

We are getting a bit off-topic here, but let me answer briefly ...

On 07.02.2015 21:36, Kingsley Idehen wrote:
...

 Not it isn't duplication. Wikipedia HTTP URLs identify Wikipedia
 documents. DBpedia URIs identify entities associated with Wikipedia
 documents. There's a world of difference here!

That's not my point (I know the difference, of course). Wikidata stores 
neither Wikipedia URLs nor DBpedia URIs. It just stores Wikipedia 
article names together with Wikimedia site (project) identifiers. The 
work to get from there to the URL is the same as the work to get to the 
URI. Storing either explicitly in another property value would only 
introduce redundancy (and potential inconsistencies). In a Linked Data 
export you could easily include one or both of these URIs, depending on 
the application, but it's not so clear that doing this in a data viewer 
would make much sense. Surely it would not be useful if people would 
have to enter all of this data manually three times.

On that note, is it the current best practice that all linked data 
exports include links to all other datasets that contain related 
information (exhaustive two-way linking)? That seems like a lot of 
triples and not very feasible if the LOD Web grows (a bit like two-way 
HTML linking ... ;-). Wouldn't it be more practical to integrate via 
shared key values? In this case, Wikipedia URLs might be a sensible 
choice to indicate the topic of a resource, rather than requiring all 
resources that have a Wikipedia article as their topic to cross link to 
all (quadratically many) other such resources directly. I would be 
curious to hear your take on this.



 There are similar issues with most of the other identifiers: they are
 usually the main IDs of the database, not the URIs of the
 corresponding RDF data (if available).

 Hmm.. if you look at the identifiers on the viewer's right hand side,
 you will find out (depending on you understanding of Linked Open Data
 concepts) that they too identify entities that are associated with Web
 pages, rather than web pages themselves.

Sure, but you are confusing the purpose of URIs with the underlying 
technical standard here. People use identifiers to refer to entities, or 
course, yet they do not use identifiers that are based on the URI 
standard. We both know about the limitations of this approach, but that 
does not change the shape of the IDs people use to refer to things 
(e.g., on Freebase, but it is the same elsewhere). Usually, if you want 
to interface with such data collections (be it via UIs or via APIs), you 
need to use their official IDs, while URIs are not supported.

This is also the answer to your other comment. You are only seeing the 
purpose of the identifier, and you rightly say that there should be no 
big technical issue to use a URI instead. I agree, yet it has to be 
done, and it has to be done differently for each case. There is no 
general rule how to construct URIs from the official IDs used by open 
data collections on today's Web.

A related problem is that most online data sets have UIs that are much 
more user friendly than any LOD browser could be based on the RDF they 
export. There is no incentive for users to click on a LOD-based view of, 
say, IMDB, if they can just go to the IMDB page instead. This should be 
taken into account when building a DBpedia LOD view (back on topic! ;-): 
people who want to learn about something will usually be better served 
by going to Wikipedia; the target audience of the viewer is probably a 
different group who wants to inspect the DBpedia data set. This should 
probably affect how the UI is built, and maybe will lead to different 
design decisions than in the Wikidata browser I mentioned.

Markus

-- 
Markus Kroetzsch
Faculty of Computer Science
Technische Universität Dresden
+49 351 463 38486
http://korrekt.org/

--
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net/
___
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion