Bug in lucene located and fixed

tremont romain Tue, 19 Nov 2002 01:23:08 -0800

Hi folk,

Remember a little ago Olivier Perrin was having trouble when indexing
and searching text in grec or russian.


I dig the source code and learned a little about javacc here is what I
found:

When you don't specify the Options UNICODE_INPUT the charactere table
created is a ASCII table. (a charctere table is different from the
encoding format !!! Unicode 3.0 is a characters table and UTF-8 is a
character encoding). So when dealing with characters over than the one
in the ASCII table javacc do not recognized it.

For exemple a russian character is not in the table. So when the
queryparser or the standard analyzer receive that, he doesn't know what
to do with it and abort.

Just by adding the UNICODE_INPUT = true; in both .jj file fixed the
problem.

Sorry for my poor english I hope you got the idea. Maybe I can submit a
little patch if wanted but I m not used to diff :)


 
-- 
tr�mont romain <[EMAIL PROTECTED]>
A.I.S. http://www.xml-ais.com


--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>

Bug in lucene located and fixed

Reply via email to