Re: Stemming Problem

Erick Erickson Tue, 18 May 2010 17:47:25 -0700

You can construct your own analyzer by creating
it from a pre-existing Tokenizer
(e.g. WhiteSpaceTokenizer) and any number
of TokenfFilters (e.g. TokenFilter). You can
string any number of TokenFilters together
to get many different effects.

But I have to ask, why you want to keep capitalization?
and punctuation? Do you really want to fail to match
text indexed with "Erickson, Erick" with the query
"erick erickson"? That's often a source of frustration
instead of goodness.

HTH
Erick

On Tue, May 18, 2010 at 2:05 PM, Larry Hendrix <[email protected]> wrote:

> Hi,
>
> Right now I'm using Lucene with a basic Whitespace Anayzer but I'm having
> problems with stemming. Does anyone have a recommendation for other text
> analyzers that handle stemming and also keep capitalization, stop words, and
> punctuation?
>
> Thanks,
> Larry
>
>
> Larry A. Hendrix, Graduate Student
> Computer Science Department
> University of Wisconsin-Madison
> 1300 University Ave Rm 6749
> Madison, WI 53711
> Office: (608) 263-7624
> [email protected]
> Grambling State University Alum
>
>

Re: Stemming Problem

Reply via email to