DO NOT REPLY [Bug 30785] New: - German Analyzer does not handle search terms with asterisks

bugzilla Sat, 21 Aug 2004 08:04:45 -0700

DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG 
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=30785>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND 
INSERTED IN THE BUG DATABASE.


http://issues.apache.org/bugzilla/show_bug.cgi?id=30785

German Analyzer does not handle search terms with asterisks

           Summary: German Analyzer does not handle search terms with
                    asterisks
           Product: Lucene
           Version: 1.4
          Platform: PC
        OS/Version: Windows XP
            Status: NEW
          Severity: Major
          Priority: Other
         Component: Search
        AssignedTo: [EMAIL PROTECTED]
        ReportedBy: [EMAIL PROTECTED]


I created a test set of text files with special German characters in them and in
UTF-8 format, which I then indexed using GermanAnalyzer and an adapted Lucene
demo program - IndexFiles.java. But the QueryParser in the demo SearchFiles.java
always returns the search term containing the original German umlauts or sz
letters, whenever I use the wildcard asterisk(*). It does not replace the
umlauts and sz letters as I would expect it to do before performing a search.
Examples: Using the GermanAnalyzer, QueryParser returns these words in lower
case, but with the German umlaut letters unchanged in the parsed queries:
Bürger*, Schlüssel*, Währ*, Straß*, herkömm, städt*, Ä*, Ö*, Ü*.
(When the above words appear with broken letters, here they are in ASCII format:
Buerger*, Schluessel*, Waehr*, Strasz*, herkoemm, staedt*, AE*, OE*, UE*).
This is leading to Lucene exceptions such as BooleanQuery.TooManyClauses in our
system that uses Lucene.

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

DO NOT REPLY [Bug 30785] New: - German Analyzer does not handle search terms with asterisks

Reply via email to