Check out org.apache.lucene.analysis.ISOLatin1AccentFilter
It will strip diacritics - just be sure to use it at index time and
query time to get what you want. Also, you will no longer be able to
differentiate between the two in your searching (rarely that important
in my opinion, but others certainly disagree).
- Mark
Christophe from paris wrote:
Hello
I'm use FrenchAnalyzer for index
IndexWriter writer = new IndexWriter(pathOfIndex, new FrenchAnalyzer(),
true);
Document = new Document();
doc.add(new
Field("TXT_CHARACT_VALUE",word.toLowerCase(),Field.Store.YES,Field.Index.TOKENIZED));
writer.addDocument(doc);
And search
IndexReader reader = IndexReader.open(pathOfIndex);
Searcher searcher = new IndexSearcher(reader);
Analyzer analyzer = new FrenchAnalyzer();
QueryParser parser = new QueryParser(field, analyzer);
Query query = parser.parse(motRecherche);
Hits hits = searcher.search(query);
in my document i have the word "lumiere" and "lumière"
when i search lumière only document match lumière but "lumiere" is not
return
and if search "lumiere" the result is lumiere, lumieres ,lumiére,lumiéres
but not lumière
for a total match i must search "lumiere OR limière"
but is not the best solution
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]