I'm not even sure if it can be considered Named Entity Recognition, but what the hell...
so here's my problem... I was asked to retrieve a the named entities out of a collection of documents, and I've thought of two ways of doing so (not sure if either of them work)... a) index the documents by wrapping the whitespace analyzer with ngramanalyzerwrapper and then retrieving only the words which have 3 or more characters and start with a capital, filtering the "garbage" manually. b) creating my own analyzer which will only index ngrams that start with capital letters and then retrieving the indexed words. which would be the best solution (i know that neither of them is very correct, but it's supposed to be a quick fix :), and if it's the second one, how would i go about creating my own analyzer? (i've read lucene in action and it wasn't much help :s) Thanks in advance, Chris -- View this message in context: http://www.nabble.com/Basic-Named-Entity-Indexing-tp14291880p14291880.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]