Problem solved.
Well, there's nothing like simple code :-) The thing is, I was using a
customanalyzer with the following code:
class CustomAnalyzer: Lucene.Net.Analysis.Standard.StandardAnalyzer
{
public static readonly System.String[] PORTUGUESE_STOP_WORDS = new
System.String[] { "a", "uma", "e", "são", "como", "onde", "ser", "mas", "por",
"se", "em", "dentro", "é", "ela", "ele", "não", "de", "ou", "s", "tais", "t",
"que", "seus", "então", "lá", "esses", "eles", "isto", "para", "era", "irá",
"com" };
public CustomAnalyzer() : base(StandardAnalyzer.STOP_WORDS)
{
}
public override TokenStream TokenStream(System.String fieldName,
System.IO.TextReader reader)
{
TokenStream result = new StandardTokenizer(reader);
result = new StandardFilter(result);
result = new LowerCaseFilter(result);
result = new ISOLatin1AccentFilter(result);
result = new StopFilter(result, PORTUGUESE_STOP_WORDS);
return result;
}
}
This customanalyzer doesn't work well in lucene 2.3. If I only place the array
of stop words in the constructor of standard analyzer it works quite well.
Thank you very much for your help Digy.
Al
-----Original Message-----
From: Digy [mailto:[email protected]]
Sent: sexta-feira, 2 de Outubro de 2009 19:28
To: [email protected]
Subject: RE: indexing special characters
I don't remember any backward compatibility related bug report. I used the
following code to test 2.0 & 2.3.2 and didn't see any difference.
RAMDirectory dir = new RAMDirectory();
IndexWriter wr = new IndexWriter(dir,new StandardAnalyzer(),true);
Document doc = new Document();
Field f = new Field("field1", "café algodão", Field.Store.YES,
Field.Index.TOKENIZED);
doc.Add(f);
wr.AddDocument(doc);
wr.Close();
IndexSearcher sr = new IndexSearcher(dir);
QueryParser qp = new QueryParser("field1", new
StandardAnalyzer());
Query q = qp.Parse("algodão");
MessageBox.Show(sr.Search(q).Length().ToString());
sr.Close();
Can you send a simple test case showing the difference between versions?
DIGY
-----Original Message-----
From: Monteiro, Alvaro [mailto:[email protected]]
Sent: Friday, October 02, 2009 6:45 PM
To: [email protected]
Subject: indexing special characters
I've started using the latest build for lucene.net (2.3).
It is a lot faster than 2.0.
I've noticed something very strange, although the indexing process is the
same, when I use the latest dll and I search for
a word with a special character (like "café", "algodão") no results are
given.
However, if I change the dll to 2.0, index the exact same thing, I search
for a word of this kind and I have results. No change in the code
whatsoever!!!
Does anyone have any idea?
Thank you so much.
Alvaro Monteiro