Marvin,
While a stemming analyzer can work well for general purpose queries, if
you're seeking a decent level of precision/recall, stemming often
severely limits you. Moreover, unless the user is very familiar with
the behavior of the stemmer used, some of the returned results can be
quite surprising. The logic of stemmers will, as you suggest, can
eliminate some false positives, it will at the same time introduce new
onees and false negatives as well.
I think the key is that, even if you have imprecise query demands that
can be met by stemming, why limit Lucene's capability to achieve high
levels of precision? Especially when the alternative (in terms of the
cat? behavior) provides a capability (matching a specific number of
characters) that very few application apparently need?
Terry
Marvin Humphrey wrote:
Terry,
Is there a reason you wouldn't use a stemming analyzer of some kind,
which would match cat and cats but not cater, catches, etc?
http://snowball.tartarus.org/demo.php
Marvin Humphrey
Rectangular Research
http://www.rectangular.com/
On Feb 21, 2006, at 3:13 PM, Terry Steichen wrote:
No, I don't think that the riot* option would work for many
queries. Let's take a simple case where you want a singular or
plural form, like either cat or cats (which would be very common).
With 1.4.x, you can use cat? to retrieve such matches. With the new
change, you need to use (cat cats) or (cat cat?). If you use cat*,
you'll get a million matches you don't want (cater, catches,
catwoman, category, catatonic, cataclysm, catamount, etc.). Or,
take a case where you want to retrieve terms like elder, elderly,
elders but do not want things like elderberry, elderdice. Or you
want gun or guns, but not gunmen, gunshots, gunfire, gunpoint,
gunston, etc.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]