> What I want to do is pass to a Lucene method some text, and have it return
> the text that it would normally put into the index.

The part of Lucene that does this is called the Analyzer.  There are 
quite a few Analyzers in the Lucene distribution, depending on the text you
plan to process, so you'll still have to choose one, but the Analyzer
interface is exactly what you want -- InputStream in, tokens out.

Start with SimpleAnalyzer and then you can get more sophisticated with
regards to tokenization (what constitutes a word) and normalization
(transformations applied to words, such as removing little words like
"and" and "of", or lopping the s off plurals, etc.)


--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>

Reply via email to