I got the noghtly build from the CVS
When I am trying to use IndexWriter this way:
writer = new IndexWriter(indexDirectory, new
RussianAnalyzer("Cp1251".toCharArray()), true);
I got the following exception
----------------------------------------------------------------------------
-----------------------
java.lang.ArrayIndexOutOfBoundsException: 7
at
org.apache.lucene.analysis.ru.RussianAnalyzer.makeStopWords(RussianAnalyzer.
java:521)
at org.apache.lucene.analysis.ru.RussianAnalyzer.(RussianAnalyzer.java:473)
----------------------------------------------------------------------------
-----------------------
When I am trying to use it this way:
writer = new IndexWriter(indexDirectory, new
RussianAnalyzer("Cp1251".toCharArray(), new String[] {}), true);
I got the following exception
----------------------------------------------------------------------------
-----------------------
2002-11-25 15:09:09,044
[ua.kiev.softline.services.searcher.index.PublishingIndexerImpl]
INFO - --Throwable in addArticle(): java.lang.ArrayIndexOut
OfBoundsException: 8
java.lang.ArrayIndexOutOfBoundsException: 8
at
org.apache.lucene.analysis.ru.RussianStemmer.isVowel(RussianStemmer.java:991
)
at
org.apache.lucene.analysis.ru.RussianStemmer.markPositions(RussianStemmer.ja
va:909)
at
org.apache.lucene.analysis.ru.RussianStemmer.stem(RussianStemmer.java:1551)
at
org.apache.lucene.analysis.ru.RussianStemFilter.next(RussianStemFilter.java:
189)
at
org.apache.lucene.index.DocumentWriter.invertDocument(DocumentWriter.java:17
0)
at
org.apache.lucene.index.DocumentWriter.addDocument(DocumentWriter.java:111)
at
org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:209)
at
ua.kiev.softline.services.searcher.index.PublishingIndexerImpl.addArticle(Pu
blishingIndexerImpl.java:130)
----------------------------------------------------------------------------
-----------------------
When I commented line 575 in RussianAnalyzer.java
result = new RussianStemFilter(result, charset);
everything works fine - I can search (and find :)) russian words...
Am I doing something wrong?
Regards, Andrey
----- Original Message -----
From: "Otis Gospodnetic" <[EMAIL PROTECTED]>
To: "Lucene Users List" <[EMAIL PROTECTED]>
Sent: Thursday, November 21, 2002 5:35 PM
Subject: Re: problems with search on Russian content
> Look at CHANGES.txt document in CVS - there is some new stuff in
> org.apache.lucene.analysis.ru package that you will want to use.
> Get the Lucene from the nightly build...
>
> Otis
>
> --- Andrey Grishin <[EMAIL PROTECTED]> wrote:
> > Hi All,
> > I have a problems with searching on Russian content using lucene 1.2
> >
> > I indexed the content using Cp1251 charset
> > ------------
> > text = new String(text.getBytes("Cp1251"));
> > doc.add(Field.Text(CONTENT_FIELD,text));
> >
> > ------------
> > and I am searching using the same charset
> >
> > String txt = "���";
> > txt = new String(txt.getBytes("Cp1251"));
> > PrefixQuery query = new PrefixQuery(new
> > Term(PortalHTMLDocument.CONTENT_FIELD, txt));
> > hits = searcher.search(query);
> >
> > or
> >
> > Analyzer analyzer = new StandardAnalyzer();
> > String txt = "������";
> > txt = new String(txt.getBytes("Cp1251"));
> > Query query = QueryParser.parse(txt,
> > PortalHTMLDocument.CONTENT_FIELD, analyzer);
> >
> > hits = searcher.search(query);
> >
> >
> > and lucene can't find nothing.
> > Also I checked for the DecodeInterceptor in my server.xml - there
> > isn't any
> >
> > I tried UTF-8/16 - and got the same result.
> >
> > Also, if I list all index's content via iterating IndexReader - I can
> > see that my russian content is stored in index...
> > Can you please help me? Do you have any more ideas about what else
> > can be done here to fix this problem?
> >
> > I will appreciate any help.
> > Thanks, Andrey.
> >
> > P.S.
> > I am using lucene 1.2, tomcat 4.1.12, jdk 1.4.1 on Win2000 AS
>
>
> __________________________________________________
> Do you Yahoo!?
> Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
> http://mailplus.yahoo.com
>
> --
> To unsubscribe, e-mail:
<mailto:[EMAIL PROTECTED]>
> For additional commands, e-mail:
<mailto:[EMAIL PROTECTED]>
>
--
To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>