** Kingsley, could you kindly add to live.dbpedia.org the same prefixes as dbpedia.org? Thanks!
I'm trying to compare the Places hierarchy (167 classes): http://mappings.dbpedia.org/server/ontology/classes/#Place against what's available in http://live.dbpedia.org/sparql. 1. Transitivity of rdfs:subClassOf is not implemented: this returns only immediate subclasses (17): prefix dbo: <http://dbpedia.org/ontology/> select * {?x rdfs:subClassOf dbo:Place} 2. Adding a Kleene closure still returns only 35: prefix dbo: <http://dbpedia.org/ontology/> select * {?x rdfs:subClassOf+ dbo:Place} 3. Adding "order by" increases to a lot more (170). ** Kingsley, this looks like a bug in Virtuoso. prefix dbo: <http://dbpedia.org/ontology/> select * {?x rdfs:subClassOf+ dbo:Place} order by ?x 4. Why sparql returns 4 more than the wiki? - Department and OverseasDepartment are subclasses of each other, which somehow causes them to be listed twice - Library is subclass of EducationalInstitution<agentand Building<<Place, and on the wiki it's listed under the first branch only - I'm not sure why Prefecture appears twice, while the other subclasses of GovernmentalAdministrativeRegion appear once ----- 10. Now let's try to count places. prefix dbo: <http://dbpedia.org/ontology/> select count(*) {?x a dbo:Place} 755779 Good! 11. But are there subclasses of Place that are not counted? Unfortunately yes. prefix dbo: <http://dbpedia.org/ontology/> select * { ?x a ?type. ?type rdfs:subClassOf+ dbo:Place filter not exists {?x a dbo:Place} } limit 1000 12. The above are CelestialBodies. Geonames doesn't have celestial bodies, so let's exclude them: prefix dbo: <http://dbpedia.org/ontology/> select * { ?x a ?type. ?type rdfs:subClassOf+ dbo:Place filter (?type != dbo:CelestialBody) filter not exists {?x a dbo:Place} } limit 1000 There are only 3 (ArchitecturalStructure, HistoricPlace), good! 13. But given 2 & 3 above, I am suspicious. Let's count by subclass: prefix dbo: <http://dbpedia.org/ontology/> select count(*) { ?x a ?type. ?type rdfs:subClassOf+ dbo:Place } 1922375 That's more than item 10 because there are multiple parent paths per place. 14. So let's use distinct (an expensive query) prefix dbo: <http://dbpedia.org/ontology/> select count(distinct ?x) { ?x a ?type. ?type rdfs:subClassOf+ dbo:Place } Virtuoso returns nothing, but doesn't say a query limit was exhausted. Tried it a second time, returned 66331. But that can't be right, that is too few. So I think it cuts off an internal resultset. I tried increasing Execution timeout to 90000 but always get no result 15. In contrast, the distinct variant of 3 returns correctly 167: prefix dbo: <http://dbpedia.org/ontology/> select count(distinct ?x) {?x rdfs:subClassOf+ dbo:Place} --- My purpose is to compare the number of DBpedia places to Geonames -> en.wikipedia links (see https://github.com/dbpedia/extraction-framework/blob/dump/scripts/src/main/b ash/process-geonames.txt). Given 10 and 12, I think 756k is a fair assessment of DBpedia places less CelestialBodies. A fresh count of the links shows 470k (and Geonames has 9M features), so: - 62% of en.dbpedia places are linked to geonames - 5.2% of geonames features are linked to en.dbpedia ------------------------------------------------------------------------------ _______________________________________________ Dbpedia-discussion mailing list Dbpedia-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion