Marvin,

While a stemming analyzer can work well for general purpose queries, if you're seeking a decent level of precision/recall, stemming often severely limits you. Moreover, unless the user is very familiar with the behavior of the stemmer used, some of the returned results can be quite surprising. The logic of stemmers will, as you suggest, can eliminate some false positives, it will at the same time introduce new onees and false negatives as well.

I think the key is that, even if you have imprecise query demands that can be met by stemming, why limit Lucene's capability to achieve high levels of precision? Especially when the alternative (in terms of the cat? behavior) provides a capability (matching a specific number of characters) that very few application apparently need?

Terry

Marvin Humphrey wrote:

Terry,

Is there a reason you wouldn't use a stemming analyzer of some kind, which would match cat and cats but not cater, catches, etc?

http://snowball.tartarus.org/demo.php

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/

On Feb 21, 2006, at 3:13 PM, Terry Steichen wrote:

No, I don't think that the riot* option would work for many queries. Let's take a simple case where you want a singular or plural form, like either cat or cats (which would be very common). With 1.4.x, you can use cat? to retrieve such matches. With the new change, you need to use (cat cats) or (cat cat?). If you use cat*, you'll get a million matches you don't want (cater, catches, catwoman, category, catatonic, cataclysm, catamount, etc.). Or, take a case where you want to retrieve terms like elder, elderly, elders but do not want things like elderberry, elderdice. Or you want gun or guns, but not gunmen, gunshots, gunfire, gunpoint, gunston, etc.



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to