Re: umlaut normalisation

Thomas Scheffler Tue, 27 Jan 2004 06:23:46 -0800

Andrzej Bialecki sagte:
> Thomas Scheffler wrote:
>
>> Hi,
>>
>> is that possible with lucene to use umlaut normalisation?
>> For example Query: H�hnerstall --> Query: Huehnerstall.
>>
>> This ofcause includes that the document was indexed with normalized
>> umlauts.
>> This issue is very important, because not every one starting a search
>> against german documents may have a german keyboard.
>
> It seems to me the best place would be to put this replacement in a
> custom Analyzer (perhaps extend GermanAnalyzer?).


I thought it would be allready available somehow since it's supported in
other major textsearch engines for example NSE from IBM, why not in
lucene?

>
>> This brings me to the next problem. Currently only Luke delivers result
>> for "H�hnerstall", my selfed implemented solution allways makes
>> "huhnerstall" out of it in the query (Why?). But ther is no
>> "huhnerstall"
>> indexed.
>>
>
> Please check which Analyzer you're using in each case.
>

DEBUG Query: MyCoReDemoDC_derivate_0014-->H�hnerstall
DEBUG Set DerivateID to MyCoReDemoDC_derivate_0014 for next query...
DEBUG parsing query using: org.apache.lucene.analysis.de.GermanAnalyzer
DEBUG adding clause: content:huhnerstall
DEBUG preparsed query:(+DerivateID:MyCoReDemoDC_derivate_0014
+content:huhnerstall)

It's the GermanAnalyzer. It doesn't matter what I choose in luke it will
allways find documents for "H�hnerstall", but I'm not able to find it the
self programmed way. My extended QueryParser overwrites parse. It put out
the Analyzer and then parses the String with super.parse(String). The
resulting Query is put in a BooleanClause and later combined withe the
first part (fieldquery using WhiteSpaceAnalyzer) you see above to a new
Query.
So there is one query part with the WhiteSpaceAnalyzer and the other with
GermanAnalyzer. But I dont' know why H�hnerstall get's to huhnerstall.

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: umlaut normalisation

Reply via email to