Eric, Jean Godby and I have been looking into this very problem. First, I want to draw your attention to the difference between NER and the subsequent problem of Identity Resolution. For example, in a given text, an NER tool would identify "Kennedy" as a name, but that name could refer to several different people. If you're able to get more information (dates, titles, etc) from the text for a given reference, you can do a better job of resolving the correct identity. Second, Jean and I planned to use WorldCat Identities [1] as our end-point and as a part of our identity resolution mechanism. With extra data, like a birth and/or death year, you can really zero in on an identity.
[1] http://www.worldcat.org/identities /dev -- Devon Smith Consulting Software Engineer OCLC Office of Research http://www.oclc.org/research/people/smith.htm On Mon, May 16, 2011 at 8:33 AM, Eric Lease Morgan <emor...@nd.edu> wrote: > What are some of the ways to best insert Linked Data endpoints into an XML > file? > > I have been playing lately with named-entity recognition/extraction > technology. [1] Feed a text file, such as a novel, into the recognition > program. Get back a rudimentary XML file where things like names, places, and > organizations are marked with simple tags. I can then extract all the place > names from a text, tabulate them, display a word-cloud, allow the reader to > select items, guess latitude and longitude of the place, and finally plot > them on a map. [2] This process works pretty well, but Google Maps only > allows me to plot a limited number of items at a time. Consequently, I am > thinking about preprocessing my data by looping through the XML file and > adding latitude and longitude attributes to the place name elements. > > I then got to thinking about names and organizations. It would be nice to > supplement these entities with canonical Linked Data endpoints. My > application could then read the endpoints, extract the links associated with > them, and display some sort of graphic illustrating relationships. Finally, I > could allow the reader to select a relationship for further investigation. > > Given a name -- say, Plato or Thoreau -- how would one go about identifying > good endpoints? What sort of query would I send to what sort of "database"? > What might I get back? Assuming my goal is to enrich the text, what sort of > link(s) should I insert into my XML? > > [1] NER - http://bit.ly/e0SnA6 > [2] geo-location for WebKit mobile - http://bit.ly/msIu16 > > -- > Eric Morgan > University of Notre Dame > -- Sent from my GMail account.