Hi,

i'm currently deciding on what of the DBpedia 3.8 dumps to load into our local 
mirror… (and updating 
http://joernhees.de/blog/2012/05/25/setting-up-a-local-dbpedia-3-7-mirror-with-virtuoso-6-1-5/
 ).
I'm a bit clueless about three files on the download that aren't explained, so 
maybe someone could shed some light on them … (I already checked 
http://dbpedia.org/Downloads38, http://wiki.dbpedia.org/Datasets, 
http://wiki.dbpedia.org/Datasets/Properties and 
http://wiki.dbpedia.org/DatasetsLoaded):


- http://downloads.dbpedia.org/3.8/en/infobox_test_en.ttl.gz :
Probably just some leftover from testing?
### snip ###
<http://dbpedia.org/resource/Autism> 
<http://dbpedia.org/resource/Template:Infobox_disease> "Name"@en .
<http://dbpedia.org/resource/Autism> 
<http://dbpedia.org/resource/Template:Infobox_disease> "Alt"@en .
…
<http://dbpedia.org/resource/Anarchism> 
<http://dbpedia.org/resource/Template:Sister_project_links> "n"@en .
<http://dbpedia.org/resource/Anarchism> 
<http://dbpedia.org/resource/Template:Sister_project_links> "v"@en .
<http://dbpedia.org/resource/Agricultural_science> 
<http://dbpedia.org/resource/Template:Infobox_Occupation> "name"@en .
### /snip ###


- http://downloads.dbpedia.org/3.8/en/topical_concepts_en.ttl.bz2 and
- http://downloads.dbpedia.org/3.8/en/topical_concepts_unredirected_en.ttl.bz2 :
It seems they contain category pages which have an associated main article. As 
such very interesting, but it seems they use the deprecated skos:subject while 
all other files use dcterms:subject, and the "redirect resolving" wasn't done 
for the subjects:

http://downloads.dbpedia.org/3.8/en/topical_concepts_unredirected_en.ttl.bz2 :
### snip ###
<http://dbpedia.org/resource/Category:Futurama> 
<http://www.w3.org/2004/02/skos/core#subject> 
<http://dbpedia.org/resource/Futurama_(TV_series)> .
<http://dbpedia.org/resource/Futurama_(TV_series)> 
<http://www.w3.org/1999/02/22-rdf-syntax-ns#type> 
<http://www.w3.org/2004/02/skos/core#Concept> .
### /snip ###

And now with redirects resolved (watch the "_(TV_series)" disappear in the 
object position but not the subject):
http://downloads.dbpedia.org/3.8/en/topical_concepts_en.ttl.bz2 :
### snip ###
<http://dbpedia.org/resource/Category:Futurama> 
<http://www.w3.org/2004/02/skos/core#subject> 
<http://dbpedia.org/resource/Futurama> .
<http://dbpedia.org/resource/Futurama_(TV_series)> 
<http://www.w3.org/1999/02/22-rdf-syntax-ns#type> 
<http://www.w3.org/2004/02/skos/core#Concept> .
### /snip ###

Are there other places where this might cause "dangling subjects"?

Despite these small issues it's actually quite an interesting dataset, any 
other reasons why it wasn't loaded? Why not map it to a "dpo:categoryMain" 
property. (dcterms:subject is already provided by article_categories and 
foaf:isPrimaryTopciOf wouldn't be correct as it would imply one of them being a 
document and it would cause problems as in older versions where people used it 
to get the corresponding wikipedia page for a topic and unintentionally get the 
category back.)


Cheers,
Jörn


------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_sfd2d_oct
_______________________________________________
Dbpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Reply via email to