RE: indexing special characters

Monteiro, Alvaro Tue, 06 Oct 2009 02:41:44 -0700

Problem solved.
Well, there's nothing like simple code :-) The thing is, I was using a 
customanalyzer with the following code:


class CustomAnalyzer: Lucene.Net.Analysis.Standard.StandardAnalyzer
    {
        public static readonly System.String[] PORTUGUESE_STOP_WORDS = new 
System.String[] { "a", "uma", "e", "são", "como", "onde", "ser", "mas", "por", 
"se", "em", "dentro", "é", "ela", "ele",  "não", "de", "ou", "s", "tais", "t", 
"que", "seus", "então", "lá", "esses", "eles", "isto", "para", "era", "irá", 
"com" };

        public CustomAnalyzer() : base(StandardAnalyzer.STOP_WORDS)
                {
            
                }
 
        public override TokenStream TokenStream(System.String fieldName, 
System.IO.TextReader reader)
        {
            TokenStream result = new StandardTokenizer(reader);
            result = new StandardFilter(result);
            result = new LowerCaseFilter(result);
            result = new ISOLatin1AccentFilter(result);
            result = new StopFilter(result, PORTUGUESE_STOP_WORDS);
            return result;
        }
    }


This customanalyzer doesn't work well in lucene 2.3. If I only place the array 
of stop words in the constructor of standard analyzer it works quite well.

Thank you very much for your help Digy.

Al

-----Original Message-----
From: Digy [mailto:[email protected]] 
Sent: sexta-feira, 2 de Outubro de 2009 19:28
To: [email protected]
Subject: RE: indexing special characters

I don't remember any backward compatibility related bug report. I used the
following code to test 2.0 & 2.3.2 and didn't see any difference.

 

            

RAMDirectory dir = new RAMDirectory(); 

 

IndexWriter wr = new IndexWriter(dir,new StandardAnalyzer(),true);

            Document doc = new Document();

            Field f = new Field("field1", "café algodão", Field.Store.YES,
Field.Index.TOKENIZED);

            doc.Add(f);

            wr.AddDocument(doc);

            wr.Close();

 

            IndexSearcher sr = new IndexSearcher(dir);

            QueryParser qp = new QueryParser("field1", new
StandardAnalyzer());

            Query q = qp.Parse("algodão");

            MessageBox.Show(sr.Search(q).Length().ToString());

            sr.Close();

 

 

 

Can you send a simple test case showing the difference between versions?

 

DIGY

 

 

 

-----Original Message-----
From: Monteiro, Alvaro [mailto:[email protected]] 
Sent: Friday, October 02, 2009 6:45 PM
To: [email protected]
Subject: indexing special characters

 

I've started using the latest build for lucene.net (2.3). 

It is a lot faster than 2.0.

 

I've noticed something very strange, although the indexing process is the
same, when I use the latest dll and I search for 

a word with a special character (like "café", "algodão") no results are
given.

However, if I change the dll to 2.0, index the exact same thing, I search
for a word of this kind and I have results. No change in the code
whatsoever!!!

 

Does anyone have any idea?

 

Thank you so much.

 

Alvaro Monteiro

RE: indexing special characters

Reply via email to