I've been looking at the wikipedia data and noticed the following 
issue.  There seem to be categories in articlecategories_en that don't 
exist in categories_label_en,  for instance

http://dbpedia.org/resource/Category:The_Like_Young_albums

If I look in the label file,

$ bzcat ~/dbpedia3.2/categories_label_en.nt.bz2 | grep The_Like_Young

I only find

<http://dbpedia.org/resource/Category:The_Like_Young_songs> 
<http://www.w3.org/2000/01/rdf-schema#label> "The Like Young songs"@en .

which doesn't match.  I found about 31,695 cases like this.  I could 
either ignore these categories or make up labels for them from looking 
at the URLs,  but it may point to a deeper problem.

I'm also thinking about enclosure relationships between categories:  If 
I look at wikipedia,  I find pages like:

http://en.wikipedia.org/wiki/Category:Chemistry

Note that Chemistry contains subcategories such as

http://en.wikipedia.org/wiki/Category:Acid-base_chemistry

Perhaps I'm missing something,  but I don't see subcategory 
relationships kept track of in wikipedia.  I know that wikipedia 
categories are pretty messy,  but I've found that graph traversals & 
filtering can be applied to them to find members of classes that slip 
through the cracks of more rigorous taxonomies -- I used methods like 
that in the construction of

http://carpictures.cc/

Are there any plans to improve category parsing in future dbpedia versions?



------------------------------------------------------------------------------
This SF.net email is sponsored by:
High Quality Requirements in a Collaborative Environment.
Download a free trial of Rational Requirements Composer Now!
http://p.sf.net/sfu/www-ibm-com
_______________________________________________
Dbpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Reply via email to