Andrzej Bialecki sagte: > Thomas Scheffler wrote: > >> Hi, >> >> is that possible with lucene to use umlaut normalisation? >> For example Query: H�hnerstall --> Query: Huehnerstall. >> >> This ofcause includes that the document was indexed with normalized >> umlauts. >> This issue is very important, because not every one starting a search >> against german documents may have a german keyboard. > > It seems to me the best place would be to put this replacement in a > custom Analyzer (perhaps extend GermanAnalyzer?).
I thought it would be allready available somehow since it's supported in other major textsearch engines for example NSE from IBM, why not in lucene? > >> This brings me to the next problem. Currently only Luke delivers result >> for "H�hnerstall", my selfed implemented solution allways makes >> "huhnerstall" out of it in the query (Why?). But ther is no >> "huhnerstall" >> indexed. >> > > Please check which Analyzer you're using in each case. > DEBUG Query: MyCoReDemoDC_derivate_0014-->H�hnerstall DEBUG Set DerivateID to MyCoReDemoDC_derivate_0014 for next query... DEBUG parsing query using: org.apache.lucene.analysis.de.GermanAnalyzer DEBUG adding clause: content:huhnerstall DEBUG preparsed query:(+DerivateID:MyCoReDemoDC_derivate_0014 +content:huhnerstall) It's the GermanAnalyzer. It doesn't matter what I choose in luke it will allways find documents for "H�hnerstall", but I'm not able to find it the self programmed way. My extended QueryParser overwrites parse. It put out the Analyzer and then parses the String with super.parse(String). The resulting Query is put in a BooleanClause and later combined withe the first part (fieldquery using WhiteSpaceAnalyzer) you see above to a new Query. So there is one query part with the WhiteSpaceAnalyzer and the other with GermanAnalyzer. But I dont' know why H�hnerstall get's to huhnerstall. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
