I've been poking about a bit and found the following things:
(1) The browser at
http://mappings.dbpedia.org/server/ontology/classes doesn't work with
Google's Chrome. Works OK for me in Firefox
(2) I'd like to see more definite comments for the items. For
instance, I'd like to see something in the definition of 'City' that
gives a specific answer to the Tokyo and London question. If this isn't
stated somewhere, things are either going to be random or we'll be
having edit wars. Personally I'd be willing to put my opinion in
there, but I'd like to see some process as to how this gets done.
(3) There is no one simple reason for why the city assignments get
lost because the infobox mappings are pretty complicated. For
instance, places like Manchester NH, NYC and Sao Paulo have an
Infobox:Settlement, and the "City" designation should be triggered by
settlement_type = City
I noticed however, that Manchester has
settlement_type = [[City]]
which is "reasonable" (certainly reflects linked data thinking) but I
don't know if the extractor is going to get that. On the other hand,
if you look at the entry for Dresden, Dresden has
Infobox:German_Location and the "Citiness" of Dresden is triggered by
the line
Art = City
in the infobox. There's also an Infobox for Japanese_City, so I'm sure
that there are a lot of details.
(4) If there's a root cause for the problem, it's that there isn't
a closed feedback loop. If you're looking at this as a problem of
"transforming something from form A to form B" it's clear that the
system produces "B". It's only when you actually try to use "B" that
you find that "B" is full of holes. Overall it's a system problem: I'm
sure that we can get better results by changing the extractor rules (in
fact, we'll get the fastest gains this way) but that some changes to
Wikipedia content be necessary too.
The complexity of the infoboxes means that an agent that does these
corrections could be a bit complex, although its behavior could
probably be controlled by the infobox mappings. For instance, it may
end up doing something a bit different for a "German Location" than it
would for a "Settlement".
Along the way it's also tempting to do some canonicalization. For
instance, the word "City" in the infobox header for NYC is just plain
text, but the word "City" for Manchester NH is a hyperlink. You can
make a case for both, but from a quality standpoint, the same thing
should be done in both cases.
In the case of the German locations I see that the English words
"Town" and "City" are often used in the "type" and "art" fields, but
the word "Stadt" is treated by the framework as if were synonymous with
"Town", which, from what little German I know, isn't quite right
(isn't Munich a /Großstadt/?) . But perhaps the word "Stadt" has some
special semantics in the context of Wikipedia, and it ought to be
preserved -- Ultimately it seems that wikipedia ought to make up it's mind.
Overall, correcting wikipedia is going to involve dealing with
entropy, dealing with politics, and probably the careful re-injection
of entropy to satisfy political constraints.
(5) I can think of a lot of toolage that would be useful here. For
instance, it would be nice to be able to look at "City" and get a list
of rules that would cause something to be identified as a "City". If I
go down this path far enough, I'm probably going to buy a new (wicked
fast) hard drive, install the extractor framework, and want to get
justifications about why the system made the assignments that it did and
progressively identify the causes of misidentifications.
It seems like the first thing I ought to do is make a list of cities
that aren't identified in DBPedia, and then the next stage is to work
down that list and find the problems
(6) I'm also interested in a mapping between wikipedia ontology
concepts and dbpedia pages... For instance, there's
http://en.wikipedia.org/wiki/City
Practically, if I'm building a site that uses the dbpedia ontology
(or something similar) I'm going to want to have user-friendly pages
that have something to say about the taxonomic classes that the site uses.
(7) Along those lines, dbpedia-owl:Building really drives me nuts.
As Wikipedia puts it,
1. Any human-made structure used or intended for supporting or
sheltering any use or continuous occupancy </wiki/Occupancy>, or
2. An act of construction </wiki/Construction> (i.e. the activity of
building, see also builder </wiki/Builder>)
dbpedia-owl:Building has a number of subclasses under it which don't
match the vernacular meaning of the word "Building", which is meaning
#1. Practically, I'd say that a building provides an environmental
shell, and would not include
Airport, Bridge, and LaunchPad
I would include the Vehicle Assembly Building at Cape Kennedy as a
Building, however, since that provides an environmental shell. I'd
say that a Barn or even a 3-sided run-in shed is a "Building" because
there's a full or partial environmental shell, and that Stations and
Stadiums are generally buildings, because they are human-inhabited and
provide at least a partial environmental shell.
I wouldn't mind using the word "Structure" for what "Building" is
now, and I'd probably want to move "Monument" under it as well.
Note "Environmental shell" is an issue with monuments too. It's
possible to go inside the Statue Of Liberty, but it's a special tour
and takes some effort to do. Does that make the Statue of Liberty a
building? The Gateway Arch, Eiffel Tower and Tokyo tower probably all
fall into the "Building" category because they all have a high level of
accommodation for visitors)
Specifically, the issue I've got is that ordinary users are going
to have a hard time with the statement that "a Bridge is a Building";
for projects like ny-pictures.com I really need some category that
corresponds to the vernacular use of the word "Building" and that avoids
things that look "crazy"
------------------------------------------------------------------------------
_______________________________________________
Dbpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion