DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT <http://issues.apache.org/bugzilla/show_bug.cgi?id=30785>. ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND INSERTED IN THE BUG DATABASE.
http://issues.apache.org/bugzilla/show_bug.cgi?id=30785 German Analyzer does not handle search terms with asterisks Summary: German Analyzer does not handle search terms with asterisks Product: Lucene Version: 1.4 Platform: PC OS/Version: Windows XP Status: NEW Severity: Major Priority: Other Component: Search AssignedTo: [EMAIL PROTECTED] ReportedBy: [EMAIL PROTECTED] I created a test set of text files with special German characters in them and in UTF-8 format, which I then indexed using GermanAnalyzer and an adapted Lucene demo program - IndexFiles.java. But the QueryParser in the demo SearchFiles.java always returns the search term containing the original German umlauts or sz letters, whenever I use the wildcard asterisk(*). It does not replace the umlauts and sz letters as I would expect it to do before performing a search. Examples: Using the GermanAnalyzer, QueryParser returns these words in lower case, but with the German umlaut letters unchanged in the parsed queries: Bürger*, Schlüssel*, Währ*, Straß*, herkömm, städt*, Ä*, Ö*, Ü*. (When the above words appear with broken letters, here they are in ASCII format: Buerger*, Schluessel*, Waehr*, Strasz*, herkoemm, staedt*, AE*, OE*, UE*). This is leading to Lucene exceptions such as BooleanQuery.TooManyClauses in our system that uses Lucene. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]