Hi All,
I have a problems with searching on Russian content using lucene 1.2
I indexed the content using Cp1251 charset
------------
text = new String(text.getBytes("Cp1251"));
doc.add(Field.Text(CONTENT_FIELD,text));
------------
and I am searching using the same charset
String txt = "���";
txt = new String(txt.getBytes("Cp1251"));
PrefixQuery query = new PrefixQuery(new Term(PortalHTMLDocument.CONTENT_FIELD, txt));
hits = searcher.search(query);
or
Analyzer analyzer = new StandardAnalyzer();
String txt = "������";
txt = new String(txt.getBytes("Cp1251"));
Query query = QueryParser.parse(txt, PortalHTMLDocument.CONTENT_FIELD, analyzer);
hits = searcher.search(query);
and lucene can't find nothing.
Also I checked for the DecodeInterceptor in my server.xml - there isn't any
I tried UTF-8/16 - and got the same result.
Also, if I list all index's content via iterating IndexReader - I can see that my
russian content is stored in index...
Can you please help me? Do you have any more ideas about what else can be done here to
fix this problem?
I will appreciate any help.
Thanks, Andrey.
P.S.
I am using lucene 1.2, tomcat 4.1.12, jdk 1.4.1 on Win2000 AS