Following the idea that "the dbpedia ontology is a classification of
wikipedia pages", I propose that there should be a category for "List
of" pages.
A search for dbpedia resources with labels the start like "List Of"
turns up:
> select count(*) from dbpedia where label like 'List of%' and is_redirect=0;
+----------+
| count(*) |
+----------+
| 54435 |
+----------+
This is about 1.7% of all dbpedia resources, and compares to 473k
categories, so lists are about 10% as prevalent as categories.
One aspect of this is that "List Of" pages aren't anything else --
although a ListOf page may be about a Person, Place, Invention, Works
or whatever, a ListOf page doesn't map directly to a named entity in
the outside world. For instance, in the set of resources I'm looking
at intensively, I've had 86 "list of" pages turn up; clearly a ListOf
shouldn't receive the same processing that, say, a Place should get,
but I could certainly use the ListOf pages to expand my list of topics
in the problem domain.
The actual content of ListOf pages is quite variable: some of them
could be replaced easily with a category, but many of them contain
hierarchical information.
http://en.wikipedia.org/wiki/List_of_zoos
or ordered lists
http://en.wikipedia.org/wiki/List_of_Star_Trek:_The_Original_Series_episode
Some of them would make nice tables in a relational database...
http://en.wikipedia.org/wiki/List_of_zx_spectrum_games
The person who tried to extract a list of state flowers from dbpedia
would have found the information they were looking for here:
http://en.wikipedia.org/wiki/List_of_state_flowers
Some of the above are obviously useful from an information
extraction viewpoint, but some are a real mess
http://en.wikipedia.org/wiki/List_of_battery_sizes
in particular, that page contains several different lists of
battery sizes (call that a hierarchy) and also contains a list of
battery chemistries, which isn't compatible with the title.
For what it's worth, metaweb seems to largely remove "ListOf" pages
when adding wikipedia resources to Freebase.
------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now. http://p.sf.net/sfu/bobj-july
_______________________________________________
Dbpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion