On Thu, 14 Mar 2002, Robert A. Decker wrote:
> Yes, unique terms. I've started looking at the StandardAnalyzer, and
> related classes, and I'll see if I can use them for what I want.
>
> Also, I'd like massage the text based a bit more than just the unique
> terms. For example, common words should be removed (some of which are
> found in the StandardAnalyzer).
>
> In addition, I'd like words to be modified a bit as well. For example, the
> word 'application' should be changed to 'applic'. The word 'deploy' to
> 'deploi', 'deploying' to 'deploy', etc.
You want the PorterStemFilter (what you're talking about is 'stemming',
and the Porter stemmer is a specific popular instance of such). See the
Lucene FAQ section 2 #23 for info on Porter stemming, and #17 for an
example of how it's used that's probably very close to what you want.
Regards,
Joshua O'Madadhain
[EMAIL PROTECTED] Per Obscurius...www.ics.uci.edu/~jmadden
Joshua Madden: Information Scientist, Musician, Philosopher-At-Tall
It's that moment of dawning comprehension that I live for--Bill Watterson
My opinions are too rational and insightful to be those of any organization.
--
To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>