On Monday 24 February 2003 05:22, Volker Luedeling wrote: > I made a small mistake in my example. My application converted all > characters to lowercase while indexing. When I comment this out, > "Etagenwohnung" remains unchanged after stemming. So, my example is bad. > However, the basic problem remains (at least for all words that do not > start with a capital letter). Take a word like "genaugenommen", for > example. It will be stemmed to "nomm", and no real fuzzy or wildcard > evaluation is possible.
Yes, there has been discussion about this problem lately. You may want to read mailing list archives to see some of the discussed problems in finding a good general solution... (brief summary: it's likely that no one solution can work 100% reliably, depending on language of content, and on body of wild-card term used etc. etc) It is fortunately fairly easy (after the patches especially) to create your own query parser, extending default one. In that parser you can use an analyzer on wildcard queries too. The only change you have to do to default analyzer(s) is to make sure that wildcards remain in query term, ie. '*' and '?' are not removed, and that these chars do not confuse stemmer (may not be trivial to do, actually?) Hope this helps, -+ Tatu +- --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
