take a look at the end of GermanAnalyzer.java

http://cvs.apache.org/viewcvs/jakarta-lucene/src/java/org/apache/lucene/anal
ysis/de/GermanAnalyzer.java?rev=1.2&content-type=text/vnd.viewcvs-markup


        public final TokenStream tokenStream( String fieldName, Reader reader ) {
                TokenStream result = new StandardTokenizer( reader );
                result = new StandardFilter( result );
                result = new StopFilter( result, stoptable );
                result = new GermanStemFilter( result, excltable );
                // Convert to lowercase after stemming!
                result = new LowerCaseFilter( result );
                return result;
        }

as you can see the analyzer converts all words to lowercase to save some
space, you can ofcourse remove the LowerCaseFilter) to get case sensetive
search. the reason why holland gives 1 and hollAnd returns 22 i can not
say...

mvh karl �ie



-----Original Message-----
From: Jan St�vesand [mailto:[EMAIL PROTECTED]]
Sent: 20. desember 2001 12:36
To: Lucene Users List
Subject: Strange Results with German Analyzer


Hi,

I used a German Analyzer for Indexing and Searching. afaik, the search is
case insensitive. At least I get the same searchresults for

kapitalanlagen
Kapitalanlagen

But, for some words the Analyzer behaves somewhat funny:

Holland -> 22 results
hollAnd -> 22 results
hollanD -> 22 results
HOLLAND -> 22 results

holland -> 1 result (!) which is NOT in the 22 results mentioned above.

I have no idea and my knowledge about Searching, stemming, indexing etc is,
well, small.

Jan


--
To unsubscribe, e-mail:
<mailto:[EMAIL PROTECTED]>
For additional commands, e-mail:
<mailto:[EMAIL PROTECTED]>


--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>

Reply via email to