Re: problems with search on Russian content

Otis Gospodnetic Thu, 21 Nov 2002 07:37:05 -0800

Look at CHANGES.txt document in CVS - there is some new stuff in
org.apache.lucene.analysis.ru package that you will want to use.
Get the Lucene from the nightly build...


Otis

--- Andrey Grishin <[EMAIL PROTECTED]> wrote:
> Hi All, 
> I have a problems with searching on Russian content using lucene 1.2
> 
> I indexed the content using Cp1251 charset
> ------------
> text = new String(text.getBytes("Cp1251"));
> doc.add(Field.Text(CONTENT_FIELD,text));
> 
> ------------
> and I am searching using the same charset
> 
> String txt = "���";
> txt = new String(txt.getBytes("Cp1251"));
> PrefixQuery query = new PrefixQuery(new
> Term(PortalHTMLDocument.CONTENT_FIELD, txt));
> hits = searcher.search(query);
> 
> or 
> 
> Analyzer analyzer = new StandardAnalyzer();
> String txt = "������";
> txt = new String(txt.getBytes("Cp1251"));
> Query query = QueryParser.parse(txt,
> PortalHTMLDocument.CONTENT_FIELD, analyzer);
> 
> hits = searcher.search(query);
> 
> 
> and lucene can't find nothing.
> Also I checked for the DecodeInterceptor in my server.xml - there
> isn't any
> 
> I tried UTF-8/16 - and got the same result.
> 
> Also, if I list all index's content via iterating IndexReader - I can
> see that my russian content is stored in index...
> Can you please help me? Do you have any more ideas about what else
> can be done here to fix this problem?
> 
> I will appreciate any help.
> Thanks, Andrey.
> 
> P.S.
> I am using lucene 1.2, tomcat 4.1.12, jdk 1.4.1 on Win2000 AS


__________________________________________________
Do you Yahoo!?
Yahoo! Mail Plus � Powerful. Affordable. Sign up now.
http://mailplus.yahoo.com

--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>

Re: problems with search on Russian content

Reply via email to