Tatu Saloranta wrote: > - Stemming can only be done for prefix queries (what is stem of, > say, "h�*er"?), and even then it might not produce stem one would > want. For example, for prefix query "men*" might be 'stemmed' to > "man*", and user might be perplexed at why documents with > words like "meningitis" and "menstrual" did not match (ok, that is > a contrived example, but hope you get the idea).
Good point. It's is really amazing how different and complex languages are ;) > In a way, you could think that user is doing "manual stemming", using > a stem of a word with prefix query. Yup, but for example the german word "M�llemann" is a surname so there is nothing to stem. If you search for "m�llema*" now, you won't get any results because "M�llemann" is indexed as "mollemann". Ok, I admit it would be uncommon to search for "m�llema*" if you want to find occurences of "M�llemann". But this is only an example. > In case of german, if umlaut chars are typically converted, perhaps > you could create a GermanQueryParser.java that just extends default > query parser, and does necessary transformation for wildcard/prefix > queries? Since there already exists separate language-dependant > stemmers, this might make sense? Yep, this would be worth a try. But I'm not sure if this really beats all problems. I'm still trying to get a whole picture of the problem ;) >> is also the problem with Umlauts (�,�,�) turned into vowels (a,o,u) >> while indexing. An example: "H�user" is the plural of "Haus". If I >> index "H�user" it is stemmed to "hau". If I do for example a search >> for "h�us*" nothing is > > Not "haus"? I meant "h�us*", but I admit it would be more natural searching for haus ;). Perhaps I'm trying to find problems where there are none ;). But it really depends on how you use Lucene. >> and defaults to off. Hope you understand my problem ;) > > Yes I do... I don't even dare to think of problems finnish analyzer > might have, with stemming. :-) ;) A bit confused Christoph --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
