I made a small mistake in my example. My application converted all characters to lowercase while indexing. When I comment this out, "Etagenwohnung" remains unchanged after stemming. So, my example is bad. However, the basic problem remains (at least for all words that do not start with a capital letter). Take a word like "genaugenommen", for example. It will be stemmed to "nomm", and no real fuzzy or wildcard evaluation is possible.
-----Original Message----- From: Volker Luedeling Sent: Montag, 24. Februar 2003 11:58 To: [EMAIL PROTECTED] Subject: Wildcard and Fuzzy queries in GermanAnalyzer Hi, I have noticed that FuzzyQueries and WildcardQueries don't do stemming. Since all terms in the index are in stemmed forms, this causes some problems: "Etagenwohnung" gets stemmed to "nwohnung". So a search for "Etagenwohnung will find "Etagenwohnung" and "nwohnung". Fuzzy search for "Etagenwohnung~" finds neither of those, but it will find "Nebenwohnung". Wildcard search for "Etagenw?hnung" also finds neither of the two documents, while "nwoh*" finds "Etagenwohnung", which is also not what a user would expect. It seems that stemming has a fundamental problem with these kinds of tolerant searches. Does anyone know how to resolve these issues? Do you plan to tackle this problem in a future release of Lucene? Volker --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
