Christoph Kiehl wrote:
Hi Volker,

I have noticed a strange problem with capitalization. Search for
"computer" results in the token "compu". Search for "Computer",
however, results in "comput". The search is supposed to be
case-insensitive, so this must be a bug, right?
This problem was already mentioned on the developer list. The analyzer tries
to do some noun recognition. But it does a bad job ;)
The analyzer should not do any case-recognition. After I read through the mailing list from the last weeks/months (I was busy last weeks), I found out that a super simple unique-discrimination algorithm is what the most users need. The original algorithm has more possible ways to extend it.

For now you could check out the current lucene version from cvs and just
comment out the following line:

 uppercase = Character.isUpperCase( term.charAt( 0 ) );

Then just run ant to built the jar. This fixes the problem you described.
I promise I will check the stemmer next days... hm... not before this weekend, i have a martial arts challenge at sunday. Mental i'm not prepared to _fix_ anything. :)

There is another problem with the Umlaut-conversion that also should be checked.

Greets,
Gerhard


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to