** Kingsley, could you kindly add to live.dbpedia.org the same prefixes as
dbpedia.org? Thanks!

I'm trying to compare the Places hierarchy (167 classes):
http://mappings.dbpedia.org/server/ontology/classes/#Place 

against what's available in http://live.dbpedia.org/sparql.

1. Transitivity of rdfs:subClassOf is not implemented: this returns only
immediate subclasses (17):

prefix dbo: <http://dbpedia.org/ontology/>
select * {?x rdfs:subClassOf dbo:Place}

2. Adding a Kleene closure still returns only 35:

prefix dbo: <http://dbpedia.org/ontology/>
select * {?x rdfs:subClassOf+ dbo:Place}

3. Adding "order by" increases to a lot more (170).
** Kingsley, this looks like a bug in Virtuoso.

prefix dbo: <http://dbpedia.org/ontology/>
select * {?x rdfs:subClassOf+ dbo:Place}
order by ?x

4. Why sparql returns 4 more than the wiki?
- Department and OverseasDepartment are subclasses of each other, which
somehow causes them to be listed twice
- Library is subclass of EducationalInstitution<agentand Building<<Place,
and on the wiki it's listed under the first branch only
- I'm not sure why Prefecture appears twice, while the other subclasses of
GovernmentalAdministrativeRegion appear once

-----

10. Now let's try to count places.

prefix dbo: <http://dbpedia.org/ontology/>
select count(*) {?x a dbo:Place}
755779
Good!

11. But are there subclasses of Place that are not counted? Unfortunately
yes.

prefix dbo: <http://dbpedia.org/ontology/>
select * {
  ?x a ?type. ?type rdfs:subClassOf+ dbo:Place
  filter not exists {?x a dbo:Place}
} limit 1000

12. The above are CelestialBodies. Geonames doesn't have celestial bodies,
so let's exclude them:

prefix dbo: <http://dbpedia.org/ontology/>
select * {
  ?x a ?type. ?type rdfs:subClassOf+ dbo:Place
  filter (?type != dbo:CelestialBody)
  filter not exists {?x a dbo:Place}
} limit 1000

There are only 3 (ArchitecturalStructure, HistoricPlace), good!

13. But given 2 & 3 above, I am suspicious. Let's count by subclass:

prefix dbo: <http://dbpedia.org/ontology/>
select count(*) {
  ?x a ?type. ?type rdfs:subClassOf+ dbo:Place
}
1922375
That's more than item 10 because there are multiple parent paths per place.

14. So let's use distinct (an expensive query)

prefix dbo: <http://dbpedia.org/ontology/>
select count(distinct ?x) {
  ?x a ?type. ?type rdfs:subClassOf+ dbo:Place
}

Virtuoso returns nothing, but doesn't say a query limit was exhausted.
Tried it a second time, returned 66331.
But that can't be right, that is too few. So I think it cuts off an internal
resultset.
I tried increasing Execution timeout to 90000 but always get no result

15. In contrast, the distinct variant of 3 returns correctly 167:

prefix dbo: <http://dbpedia.org/ontology/>
select count(distinct ?x) {?x rdfs:subClassOf+ dbo:Place}

---

My purpose is to compare the number of DBpedia places to Geonames ->
en.wikipedia links
(see
https://github.com/dbpedia/extraction-framework/blob/dump/scripts/src/main/b
ash/process-geonames.txt).
Given 10 and 12, I think 756k is a fair assessment of DBpedia places less
CelestialBodies.
A fresh count of the links shows 470k (and Geonames has 9M features), so:
- 62% of en.dbpedia places are linked to geonames
- 5.2% of geonames features are linked to en.dbpedia



------------------------------------------------------------------------------
_______________________________________________
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Reply via email to