On Friday 30 May 2003 09:55, Leo Galambos wrote: > Ah, I got it. THX. In the good old days, the wildcards were used as a > fix for missing stemming module. I am not sure if you can combine these > two opposite approaches successfully. I see the following drawbacks of > your solution. > > Example: > built* (->built) could be changed to build* (no built, but ->builder, > building, etc.), and precision will go down drastically. > > You probably use a stemmer with one important bug (a.k.a. feature) - > overstemming, so here is another example: > political* (->political, politically) is transformed to polic* > (->policer, policy, policies, policement etc.) by Porter alg., and the > precision is again affected drastically
Yes, this is the exact problem that was brought up last time this was discussed. It may not be a very common problem (most of the time stemming a wildcard part probably works ok, somebody had tried that), but still a potential one. And that's why default lower casing was added, as it solved one of FAQs. It is much more common that analyzer used for non-wildcard query does lower casing, than not, and thus default setting (which leads to having to turn feature off by some users) seems to make sense. More general problem then is that there's no real way to stem "foo?ar", or any non-prefix wildcard query, but that could be figured out by QueryParser if necessary. -+ Tatu +- --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
