Check out org.apache.lucene.analysis.ISOLatin1AccentFilter

It will strip diacritics - just be sure to use it at index time and query time to get what you want. Also, you will no longer be able to differentiate between the two in your searching (rarely that important in my opinion, but others certainly disagree).

- Mark

Christophe from paris wrote:
Hello

I'm use FrenchAnalyzer for index
IndexWriter writer = new IndexWriter(pathOfIndex, new FrenchAnalyzer(),
true);
Document = new Document();
doc.add(new
Field("TXT_CHARACT_VALUE",word.toLowerCase(),Field.Store.YES,Field.Index.TOKENIZED));
writer.addDocument(doc);

And search

IndexReader reader = IndexReader.open(pathOfIndex);                     
Searcher searcher = new IndexSearcher(reader);
Analyzer analyzer = new FrenchAnalyzer();                                       
        
QueryParser parser = new QueryParser(field, analyzer);                          
        
Query query = parser.parse(motRecherche);
Hits hits = searcher.search(query);

in my document i have the word "lumiere" and "lumière"

when i search lumière only document match lumière but "lumiere" is not
return

and if search "lumiere" the result is lumiere, lumieres ,lumiére,lumiéres
but not lumière

for a total match i must search "lumiere OR limière"
but is not the best solution


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to