Indexing and searching non-latin languages using utf-8

MERCIER ALEXANDRE Tue, 18 Mar 2003 08:34:56 -0800

Hi all,

I've a matter with indexing then searching docs written in non-latin
languages and encoded in utf-8 (Russian, by example).


I have a web application, with a simple form to search in the contents of
the docs.
When I submit the form, I encode the query term in utf-8 with
encodeURI(String) but I match no doc. I think that is due to a bad indexing
but I'm not sure.

Lucene is normally indexing docs in writing Terms in the 'xxx.tis' file,
encoding it in utf-8, I believe.
So when it reads the file, it correctly gets russian characters (2 bytes)
but when writing them in the index, they seem different (I've listed the
terms in my application console).

If someone has a solution to resolve my problem, all advices are welcome.

Thanks.
Alex


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Indexing and searching non-latin languages using utf-8

Reply via email to