On Wednesday 12 February 2003 11:39, Christoph Kiehl wrote: > Hi Doug, > > > Also, I think we should lowercase prefix and wildcard queries by ... > > wildcard searches. What do others think? > > For the StandardAnalyzer this might work, but for the GermanAnalyzer, there
Solving this problem should be easier after refactoring, just override 'getPrefixQuery()' and 'getWildcardQuery' (see below for one possible idea of what could be done). Another possibility would be to have another property for enabling use of same analyzer used for normal terms for wildcard/prefix queries. However, using typical analyzers is not something one usually wants to do for couple of reasons: - Wildcards are discarded by analyzer, so wildcard query will get broken (ie. one needs wildcard-char - aware analyzer) - Stemming can only be done for prefix queries (what is stem of, say, "h�*er"?), and even then it might not produce stem one would want. For example, for prefix query "men*" might be 'stemmed' to "man*", and user might be perplexed at why documents with words like "meningitis" and "menstrual" did not match (ok, that is a contrived example, but hope you get the idea). In a way, you could think that user is doing "manual stemming", using a stem of a word with prefix query. In case of german, if umlaut chars are typically converted, perhaps you could create a GermanQueryParser.java that just extends default query parser, and does necessary transformation for wildcard/prefix queries? Since there already exists separate language-dependant stemmers, this might make sense? > is also the problem with Umlauts (�,�,�) turned into vowels (a,o,u) while > indexing. An example: "H�user" is the plural of "Haus". If I index "H�user" > it is stemmed to "hau". If I do for example a search for "h�us*" nothing is Not "haus"? > found, because "h�us" is not stemmed. If I would analyze "h�us*" I should > get "hau*". The problem is, that now you do not only get "H�user" but also > "Haus" as result. But I think it is better to get more results than no > result. This is perhaps a special problem with the GermanAnalyzer. May be > there could be an option to use the Analyzer also for wildcard queries. So > I can turn it on in my case and defaults to off. > Hope you understand my problem ;) Yes I do... I don't even dare to think of problems finnish analyzer might have, with stemming. :-) -+ Tatu +- --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
