Hi, i'm currently deciding on what of the DBpedia 3.8 dumps to load into our local mirror… (and updating http://joernhees.de/blog/2012/05/25/setting-up-a-local-dbpedia-3-7-mirror-with-virtuoso-6-1-5/ ). I'm a bit clueless about three files on the download that aren't explained, so maybe someone could shed some light on them … (I already checked http://dbpedia.org/Downloads38, http://wiki.dbpedia.org/Datasets, http://wiki.dbpedia.org/Datasets/Properties and http://wiki.dbpedia.org/DatasetsLoaded):
- http://downloads.dbpedia.org/3.8/en/infobox_test_en.ttl.gz : Probably just some leftover from testing? ### snip ### <http://dbpedia.org/resource/Autism> <http://dbpedia.org/resource/Template:Infobox_disease> "Name"@en . <http://dbpedia.org/resource/Autism> <http://dbpedia.org/resource/Template:Infobox_disease> "Alt"@en . … <http://dbpedia.org/resource/Anarchism> <http://dbpedia.org/resource/Template:Sister_project_links> "n"@en . <http://dbpedia.org/resource/Anarchism> <http://dbpedia.org/resource/Template:Sister_project_links> "v"@en . <http://dbpedia.org/resource/Agricultural_science> <http://dbpedia.org/resource/Template:Infobox_Occupation> "name"@en . ### /snip ### - http://downloads.dbpedia.org/3.8/en/topical_concepts_en.ttl.bz2 and - http://downloads.dbpedia.org/3.8/en/topical_concepts_unredirected_en.ttl.bz2 : It seems they contain category pages which have an associated main article. As such very interesting, but it seems they use the deprecated skos:subject while all other files use dcterms:subject, and the "redirect resolving" wasn't done for the subjects: http://downloads.dbpedia.org/3.8/en/topical_concepts_unredirected_en.ttl.bz2 : ### snip ### <http://dbpedia.org/resource/Category:Futurama> <http://www.w3.org/2004/02/skos/core#subject> <http://dbpedia.org/resource/Futurama_(TV_series)> . <http://dbpedia.org/resource/Futurama_(TV_series)> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2004/02/skos/core#Concept> . ### /snip ### And now with redirects resolved (watch the "_(TV_series)" disappear in the object position but not the subject): http://downloads.dbpedia.org/3.8/en/topical_concepts_en.ttl.bz2 : ### snip ### <http://dbpedia.org/resource/Category:Futurama> <http://www.w3.org/2004/02/skos/core#subject> <http://dbpedia.org/resource/Futurama> . <http://dbpedia.org/resource/Futurama_(TV_series)> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2004/02/skos/core#Concept> . ### /snip ### Are there other places where this might cause "dangling subjects"? Despite these small issues it's actually quite an interesting dataset, any other reasons why it wasn't loaded? Why not map it to a "dpo:categoryMain" property. (dcterms:subject is already provided by article_categories and foaf:isPrimaryTopciOf wouldn't be correct as it would imply one of them being a document and it would cause problems as in older versions where people used it to get the corresponding wikipedia page for a topic and unintentionally get the category back.) Cheers, Jörn ------------------------------------------------------------------------------ Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_sfd2d_oct _______________________________________________ Dbpedia-discussion mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
