Hi, > > Also, I think we should lowercase prefix and wildcard queries by > > default. This would fix one of the most frequently > reported problems. > > Yes, it might also break folks who currently do case-sensitive > > wildcard queries, but I suspect they are far fewer than > those who will > > continue to complain about the default case-sensitivity of wildcard > > searches. What do others think? > > For the StandardAnalyzer this might work, but for the > GermanAnalyzer, there is also the problem with Umlauts > (�,�,�) turned into vowels (a,o,u) while indexing. An > example: "H�user" is the plural of "Haus". If I index > "H�user" it is stemmed to "hau". If I do for example a search > for "h�us*" nothing is found, because "h�us" is not stemmed. > If I would analyze "h�us*" I should get "hau*". The problem > is, that now you do not only get "H�user" but also "Haus" as > result. But I think it is better to get more results than no > result. This is perhaps a special problem with the > GermanAnalyzer. May be there could be an option to use the > Analyzer also for wildcard queries. So I can turn it on in my > case and defaults to off. Hope you understand my problem ;)
I second that, it is true for many languages where a "standard" analyzer will most of the time do more than removing uppercase, it will remove "diacritics" like in the above example. Along with possibly stemming. Lucene is a wonderful tool for building i18n-ready search engines, let's not forget it ;-) Martin S�vigny --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
