Eric,

Jean Godby and I have been looking into this very problem. First, I
want to draw your attention to the difference between NER and the
subsequent problem of Identity Resolution. For example, in a given
text, an NER tool would identify "Kennedy" as a name, but that name
could refer to several different people. If you're able to get more
information (dates, titles, etc) from the text for a given reference,
you can do a better job of resolving the correct identity. Second,
Jean and I planned to use WorldCat Identities [1] as our end-point and
as a part of our identity resolution mechanism. With extra data, like
a birth and/or death year, you can really zero in on an identity.

[1] http://www.worldcat.org/identities

/dev

-- 
Devon Smith
Consulting Software Engineer
OCLC Office of Research
http://www.oclc.org/research/people/smith.htm

On Mon, May 16, 2011 at 8:33 AM, Eric Lease Morgan <emor...@nd.edu> wrote:
> What are some of the ways to best insert Linked Data endpoints into an XML 
> file?
>
> I have been playing lately with named-entity recognition/extraction 
> technology. [1] Feed a text file, such as a novel, into the recognition 
> program. Get back a rudimentary XML file where things like names, places, and 
> organizations are marked with simple tags. I can then extract all the place 
> names from a text, tabulate them, display a word-cloud, allow the reader to 
> select items, guess latitude and longitude of the place, and finally plot 
> them on a map. [2] This process works pretty well, but Google Maps only 
> allows me to plot a limited number of items at a time. Consequently, I am 
> thinking about preprocessing my data by looping through the XML file and 
> adding latitude and longitude attributes to the place name elements.
>
> I then got to thinking about names and organizations. It would be nice to 
> supplement these entities with canonical Linked Data endpoints. My 
> application could then read the endpoints, extract the links associated with 
> them, and display some sort of graphic illustrating relationships. Finally, I 
> could allow the reader to select a relationship for further investigation.
>
> Given a name -- say, Plato or Thoreau -- how would one go about identifying 
> good endpoints? What sort of query would I send to what sort of "database"? 
> What might I get back? Assuming my goal is to enrich the text, what sort of 
> link(s) should I insert into my XML?
>
> [1] NER - http://bit.ly/e0SnA6
> [2] geo-location for WebKit mobile - http://bit.ly/msIu16
>
> --
> Eric Morgan
> University of Notre Dame
>



-- 
Sent from my GMail account.

Reply via email to