Re: search with accent not match

Christophe from paris Thu, 07 Aug 2008 05:27:38 -0700

Yes  markrmiller,the order is important
then 

 TokenStream result = new StandardTokenizer(reader);
    result = new StandardFilter(result);  
    result = new StopFilter(result, stoptable);    
    result = new ISOLatin1AccentFilter(result);
    result = new FrenchStemFilter(result, excltable);
    result = new LowerCaseFilter(result);


And finaly with ISOLatin1AccentFilter the result is good :)

tanks you.

Now go the polish search ^^


markrmiller wrote:
> 
> You certainly can - just create your own Analyzer starting with a copy 
> of the French one you are using.
> 
> Then you just plug in the filter in the order you want it applied:
> 
> result = new ISOLatin1AccentFilter(result);
> 
> You have to decide for yourself where it will come - if you put it 
> before the stopword step, more stops words might be removed than if it 
> was after - that type of thing usually comes down to individual 
> requirements/filter limitations. If your stopword list has diacriticals 
> and you run the accent filter before applying the stopword list, some 
> expected stopwords will never be removed...etc.
> 
> 
> Christophe from paris wrote:
>> Actualy in my FrenchAnalyser 
>>
>> i have :
>>
>>  TokenStream result = new StandardTokenizer(reader);
>>     result = new StandardFilter(result);
>>     result = new StopFilter(result, stoptable);
>>     result = new FrenchStemFilter(result, excltable);
>>     result = new LowerCaseFilter(result);
>>
>>
>> I can use ISOLatin1AccentFilter in this Class for indexing ans search ?
>> And it is the case where ?
>>
>>
>> markrmiller wrote:
>>   
>>> Check out org.apache.lucene.analysis.ISOLatin1AccentFilter
>>>
>>> It will strip diacritics - just be sure to use it at index time and 
>>> query time to get what you want. Also, you will no longer be able to 
>>> differentiate between the two in your searching (rarely that important 
>>> in my opinion, but others certainly disagree).
>>>
>>> - Mark
>>>
>>> Christophe from paris wrote:
>>>     
>>>> Hello
>>>>
>>>> I'm use FrenchAnalyzer for index 
>>>>
>>>> IndexWriter writer = new IndexWriter(pathOfIndex, new FrenchAnalyzer(),
>>>> true);
>>>> Document = new Document();
>>>> doc.add(new
>>>> Field("TXT_CHARACT_VALUE",word.toLowerCase(),Field.Store.YES,Field.Index.TOKENIZED));
>>>> writer.addDocument(doc);
>>>>
>>>> And search
>>>>
>>>> IndexReader reader = IndexReader.open(pathOfIndex);                        
>>>> Searcher searcher = new IndexSearcher(reader);
>>>> Analyzer analyzer = new FrenchAnalyzer();                                  
>>>>         
>>>> QueryParser parser = new QueryParser(field, analyzer);                     
>>>>                 
>>>> Query query = parser.parse(motRecherche);
>>>> Hits hits = searcher.search(query);
>>>>
>>>> in my document i have the word "lumiere" and "lumière"
>>>>
>>>> when i search lumière only document match lumière but "lumiere" is not
>>>> return
>>>>
>>>> and if search "lumiere" the result is lumiere, lumieres
>>>> ,lumiére,lumiéres
>>>> but not lumière
>>>>
>>>> for a total match i must search "lumiere OR limière"
>>>> but is not the best solution 
>>>>   
>>>>       
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [EMAIL PROTECTED]
>>> For additional commands, e-mail: [EMAIL PROTECTED]
>>>
>>>
>>>
>>>     
>>
>>   
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/search-with-accent-not-match-tp18848522p18869247.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: search with accent not match

Reply via email to