problems with search on Russian content

Andrey Grishin Thu, 21 Nov 2002 05:39:48 -0800

Hi All, 
I have a problems with searching on Russian content using lucene 1.2


I indexed the content using Cp1251 charset
------------
text = new String(text.getBytes("Cp1251"));
doc.add(Field.Text(CONTENT_FIELD,text));

------------
and I am searching using the same charset

String txt = "���";
txt = new String(txt.getBytes("Cp1251"));
PrefixQuery query = new PrefixQuery(new Term(PortalHTMLDocument.CONTENT_FIELD, txt));
hits = searcher.search(query);

or 

Analyzer analyzer = new StandardAnalyzer();
String txt = "������";
txt = new String(txt.getBytes("Cp1251"));
Query query = QueryParser.parse(txt, PortalHTMLDocument.CONTENT_FIELD, analyzer);

hits = searcher.search(query);


and lucene can't find nothing.
Also I checked for the DecodeInterceptor in my server.xml - there isn't any

I tried UTF-8/16 - and got the same result.

Also, if I list all index's content via iterating IndexReader - I can see that my 
russian content is stored in index...
Can you please help me? Do you have any more ideas about what else can be done here to 
fix this problem?

I will appreciate any help.
Thanks, Andrey.

P.S.
I am using lucene 1.2, tomcat 4.1.12, jdk 1.4.1 on Win2000 AS

problems with search on Russian content

Reply via email to