What are some of the ways to best insert Linked Data endpoints into an XML file?

I have been playing lately with named-entity recognition/extraction technology. 
[1] Feed a text file, such as a novel, into the recognition program. Get back a 
rudimentary XML file where things like names, places, and organizations are 
marked with simple tags. I can then extract all the place names from a text, 
tabulate them, display a word-cloud, allow the reader to select items, guess 
latitude and longitude of the place, and finally plot them on a map. [2] This 
process works pretty well, but Google Maps only allows me to plot a limited 
number of items at a time. Consequently, I am thinking about preprocessing my 
data by looping through the XML file and adding latitude and longitude 
attributes to the place name elements.

I then got to thinking about names and organizations. It would be nice to 
supplement these entities with canonical Linked Data endpoints. My application 
could then read the endpoints, extract the links associated with them, and 
display some sort of graphic illustrating relationships. Finally, I could allow 
the reader to select a relationship for further investigation.

Given a name -- say, Plato or Thoreau -- how would one go about identifying 
good endpoints? What sort of query would I send to what sort of "database"? 
What might I get back? Assuming my goal is to enrich the text, what sort of 
link(s) should I insert into my XML?

[1] NER - http://bit.ly/e0SnA6
[2] geo-location for WebKit mobile - http://bit.ly/msIu16

-- 
Eric Morgan
University of Notre Dame

Reply via email to