Hi, I can't answer your question -- sorry! But, I was curious about the NLP you describe. Are there algorithms available for determining negation automatically, and are they accurate?
Sincerely, James > -----Original Message----- > From: 1world1love [mailto:[EMAIL PROTECTED] > Sent: Thursday, December 20, 2007 9:48 AM > To: [email protected] > Subject: advice on integrating NLP engine during indexing > > > Greetings all. I am new to Lucene and am looking for a little > advice/direction/feedback on what I am trying to do. I want to index and > query millions of documents that are unstructured and resemble > crime/police/phsychiatric reports; no problem, lucene is perfect for this. > > The trick is that I need to exclude certain terms from the index such as > those terms that are negated or information that could potentially identify > people. I have a collection of natural language processing tools that are > able to tag or remove/replace such terms. > > I need to design the indexing such that I can feed each document through > these tools and then incorporate the results into the indexing strategy. > > As an example, if I have a report that has the phrase: "Mr. Smith has no > history of violence against women prior to this event" > > The NLP engine would recognize the name Smith and the negation of the term > "violence" and would tag them as such. I would then like to exclude those > terms from the indexing as seems prudent. > > Another strategy I would like to look at is to include the tags in the index > to incorprate it into the search engine. That is to say, whether a subject > "likely" has a history of violence, "may" have a history of violence, or > "does not" have a history of violence. > > I assume that I will need to design a custom analyzer to do this, but I was > hoping to solicit any comments, advice, or general suggestions before I get > started. > > Thanks in advance, > > j > > > -- > View this message in context: http://www.nabble.com/advice-on-integrating-NLP- > engine-during-indexing-tp14437913p14437913.html > Sent from the Lucene - General mailing list archive at Nabble.com.
