Re: problems with search on Russian content

Karl пїЅie Fri, 22 Nov 2002 00:44:51 -0800

Sorry, my bad! Didn't read this informative post :-)

mvh karl Гёie



On Thursday, Nov 21, 2002, at 16:35 Europe/Oslo, Otis Gospodnetic wrote:

Look at CHANGES.txt document in CVS - there is some new stuff in
org.apache.lucene.analysis.ru package that you will want to use.
Get the Lucene from the nightly build...

Otis

--- Andrey Grishin <[EMAIL PROTECTED]> wrote:

Hi All,
I have a problems with searching on Russian content using lucene 1.2

I indexed the content using Cp1251 charset
------------
text = new String(text.getBytes("Cp1251"));
doc.add(Field.Text(CONTENT_FIELD,text));

------------
and I am searching using the same charset

String txt = "В·Е’Ж’";
txt = new String(txt.getBytes("Cp1251"));
PrefixQuery query = new PrefixQuery(new
Term(PortalHTMLDocument.CONTENT_FIELD, txt));
hits = searcher.search(query);

or

Analyzer analyzer = new StandardAnalyzer();
String txt = "В·Е’Ж’вЂњв‰€В ";
txt = new String(txt.getBytes("Cp1251"));
Query query = QueryParser.parse(txt,
PortalHTMLDocument.CONTENT_FIELD, analyzer);

hits = searcher.search(query);


and lucene can't find nothing.
Also I checked for the DecodeInterceptor in my server.xml - there
isn't any

I tried UTF-8/16 - and got the same result.

Also, if I list all index's content via iterating IndexReader - I can
see that my russian content is stored in index...
Can you please help me? Do you have any more ideas about what else
can be done here to fix this problem?

I will appreciate any help.
Thanks, Andrey.

P.S.
I am using lucene 1.2, tomcat 4.1.12, jdk 1.4.1 on Win2000 AS

__________________________________________________
Do you Yahoo!?
Yahoo! Mail Plus Г± Powerful. Affordable. Sign up now.
http://mailplus.yahoo.com

--
To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>


--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>

Re: problems with search on Russian content

Reply via email to