Following the idea that "the dbpedia ontology is a classification of 
wikipedia pages",  I propose that there should be a category for "List 
of" pages.

    A search for dbpedia resources with labels the start like "List Of" 
turns up:

> select count(*) from dbpedia where label like 'List of%' and is_redirect=0;
+----------+
| count(*) |
+----------+
|    54435 |
+----------+


    This is about 1.7% of all dbpedia resources,  and compares to 473k 
categories,  so lists are about 10% as prevalent as categories.

    One aspect of this is that "List Of" pages aren't anything else -- 
although a ListOf page may be about a Person,  Place,  Invention,  Works 
or whatever,  a ListOf page doesn't map directly to a named entity in 
the outside world.  For instance,  in the set of resources I'm looking 
at intensively,  I've had 86 "list of" pages turn up;  clearly a ListOf 
shouldn't receive the same processing that,  say,  a Place should get,  
but I could certainly use the ListOf pages to expand my list of topics 
in the problem domain.

    The actual content of ListOf pages is quite variable:  some of them 
could be replaced easily with a category,  but many of them contain 
hierarchical information.

http://en.wikipedia.org/wiki/List_of_zoos

    or ordered lists

http://en.wikipedia.org/wiki/List_of_Star_Trek:_The_Original_Series_episode

    Some of them would make nice tables in a relational database...

http://en.wikipedia.org/wiki/List_of_zx_spectrum_games

    The person who tried to extract a list of state flowers from dbpedia 
would have found the information they were looking for here:

http://en.wikipedia.org/wiki/List_of_state_flowers

    Some of the above are obviously useful from an information 
extraction viewpoint,  but some are a real mess

http://en.wikipedia.org/wiki/List_of_battery_sizes

    in particular,  that page contains several different lists of 
battery sizes (call that a hierarchy) and also contains a list of 
battery chemistries,  which isn't compatible with the title.

    For what it's worth,  metaweb seems to largely remove "ListOf" pages 
when adding wikipedia resources to Freebase.
   


------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Dbpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Reply via email to