Hi Daniel,
A list of persons extracted using the procedure described hereafter is
available at http://comupedia.org/downloads/
1. build a list of occupations (terms like accountant, actor, actress, actuary)
from http://en.wikipedia.org/wiki/List_of_occupations and a list of
nationalities (terms like afghani, albanian, algerian) from
http://en.wikipedia.org/wiki/List_of_nationalities
2.
go through the list of Wikipedia articles and consider that the article
is about a person if its categories contain at least one term from the
list of occupations AND at least one from the list of nationalities.
3.I also added some rules on titles to eliminate relatively frequent errors
(titles containing "'s' " such as 1984 Australian Drivers' Championship or
titles standing for roles such as "Prince of Wales").
The list contains 499769 names. Please let me know if you find any recurring
error.
Adrian
________________________________
From: Daniel Naber <[email protected]>
To: Popescu Adrian <[email protected]>
Sent: Thursday, 30 July, 2009 23:57:29
Subject: Re: [Dbpedia-discussion] The Untyped
On Thursday 30 July 2009, Popescu Adrian wrote:
Hi Adrian,
> Should you be interested, I can provide results samples in order for you
> to check results.
I'm interested - would it maybe even be possible for you to publish the
complete list?
Regards
Daniel
--
http://www.danielnaber.de
------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now. http://p.sf.net/sfu/bobj-july
_______________________________________________
Dbpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion