I think you should write your own Analyzer and use: * StandardTokenizer for tokenization and ACRONYM detection. * StopFilter for stopwrods handling.
The Analyzer you write should override tokenStream() and do something like: ************************************************************ TokenStream result = new StandardTokenizer(reader); result = new LowerCaseFilter(result); // if lower casing is also what you want. result = new StopFilter(result, stopWords); return result; ************************************************************ StandardAnalyzer wraps StandardTokenizer with StandardFilter, which strips the acronym off its '.', so you don't want to use it. Shai On Sun, Jul 19, 2009 at 8:53 AM, mitu2009 <[email protected]> wrote: > > Hi, > > If i want Lucene to preserve dots of acronyms(example: U.K,U.S.A. etc), > which analyzer do i need to use and how? I also want to input a set of stop > words to Lucene while doing this. > > -- > View this message in context: > http://www.nabble.com/Preserving-dots-of-an-acronym-while-indexing-in-Lucene-tp24554342p24554342.html > Sent from the Lucene - Java Users mailing list archive at Nabble.com. > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > >
