This was the problem, it worked excellent!
Thanks for the help!
-Albert
karl wettin-3 wrote:
>
>
> 27 aug 2007 kl. 20.30 skrev anorman:
>
>>
>> I've tried to implement an analyzer with little different then using:
>>
>> result = new ISOLatin1AccentFilter(result); in the TokenStream
>> method.
>>
>> Everything appears to work, however my search will not work for any
>> word
>> with diacritics with that change. Without using it will find words
>> such as
>> "cèdulas" but not "cedulas", with the change it will find neither,
>> it just
>> appears to be stripping it out altogether.
>
> Do you use the same analyzer when searching as when creating the index?
>
> --
> karl
>
>
>>
>> Any suggestions?
>>
>>
>>
>>
>> anorman wrote:
>>>
>>> Can I do this at search time rather than index time? Below is my
>>> code
>>> that is handling the searching, where would I utilize such a filter?
>>>
>>> Thanks for the help!
>>>
>>>
>>>
>>>
>>> package search.lucene.search;
>>> import org.apache.lucene.document.Document;
>>> import java.io.IOException;
>>> import java.util.ArrayList;
>>> import java.util.List;
>>>
>>> import org.apache.lucene.analysis.Analyzer;
>>> import org.apache.lucene.analysis.standard.StandardAnalyzer;
>>> import org.apache.lucene.analysis.ISOLatin1AccentFilter;
>>> import org.apache.lucene.queryParser.ParseException;
>>> import org.apache.lucene.queryParser.QueryParser;
>>> import org.apache.lucene.search.Hits;
>>> import org.apache.lucene.search.IndexSearcher;
>>> import org.apache.lucene.search.Query;
>>>
>>> import search.lucene.index.IndexManager;
>>>
>>> /**
>>> * This class is used to search the
>>> * Lucene index and return search results
>>> */
>>>
>>> public class SearchManager {
>>>
>>>
>>> private String searchWord;
>>>
>>> private IndexManager indexManager;
>>>
>>> private Analyzer analyzer;
>>>
>>> public SearchManager(String searchWord){
>>> this.searchWord = searchWord;
>>> this.indexManager = new IndexManager();
>>> this.analyzer = new StandardAnalyzer();
>>> }
>>>
>>> /**
>>> * do search
>>> */
>>> public List search(){
>>> List searchResult = new ArrayList();
>>>
>>> IndexSearcher indexSearcher = null;
>>>
>>> try{
>>> indexSearcher = new IndexSearcher
>>> (indexManager.getIndexDir());
>>> }catch(IOException ioe){
>>> ioe.printStackTrace();
>>> }
>>>
>>> QueryParser queryParser = new QueryParser
>>> ("content",analyzer);
>>> Query query = null;
>>> try {
>>> query = queryParser.parse(searchWord);
>>> } catch (ParseException e) {
>>> e.printStackTrace();
>>> }
>>>
>>> if(null != query && null != indexSearcher){
>>> try {
>>> Hits hits = indexSearcher.search(query);
>>> for(int i = 0; i < hits.length(); i ++){
>>>
>>> Document doc = hits.doc(i);
>>> System.out.println(doc.get("filename"));
>>>
>>> SearchResultBean resultBean = new
>>> SearchResultBean();
>>>
>>>
>>> resultBean.setXMLId(hits.doc(i).get("id"));
>>>
>>> resultBean.setXMLTitle(hits.doc(i).get("title"));
>>>
>>> resultBean.setXMLAuthor(hits.doc(i).get("author"));
>>>
>>> resultBean.setXMLAbstract(hits.doc(i).get("abstract"));
>>> resultBean.setScore(hits.score(i));
>>>
>>> searchResult.add(resultBean);
>>> }
>>> } catch (IOException e) {
>>> e.printStackTrace();
>>> }
>>> }
>>> return searchResult;
>>>
>>> }
>>>
>>>
>>>
>>>
>>>
>>>
>>> thomas arni-2 wrote:
>>>>
>>>> You can extend the DefaultAnalyzer.
>>>> The only thing you have to do, is to rewrite the method
>>>> tokenStream like
>>>> this:
>>>>
>>>> /** Constructs a [EMAIL PROTECTED] StandardTokenizer} filtered by a
>>>> [EMAIL PROTECTED]
>>>> StandardFilter}, a [EMAIL PROTECTED] LowerCaseFilter} and a [EMAIL
>>>> PROTECTED]
>>>> StopFilter}. */
>>>> public TokenStream tokenStream(String fieldName, Reader reader) {
>>>> TokenStream result = new StandardTokenizer(reader);
>>>> result = new StandardFilter(result);
>>>> result = new LowerCaseFilter(result);
>>>> result = new StopFilter(result, stopSet);
>>>> result = new ISOLatin1AccentFilter(result);
>>>> return result;
>>>> }
>>>>
>>>>
>>>> anorman wrote:
>>>>> This looks like exactly what I want. Would I implement this
>>>>> along with
>>>>> another analyzer such as the standard or stand alone? Does
>>>>> anyone have
>>>>> any
>>>>> code examples of implementing such a thing?
>>>>>
>>>>> Thanks,
>>>>> Albert
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> karl wettin-3 wrote:
>>>>>
>>>>>> 27 aug 2007 kl. 16.03 skrev anorman:
>>>>>>
>>>>>>
>>>>>>> I have a searchable index of documents which contain french and
>>>>>>> spanish
>>>>>>> diacritics (è, é, À) etc. I would like to make the content
>>>>>>> searchable so
>>>>>>> that when a user searches for a word such as "Amèrique" or
>>>>>>> "Amerique"
>>>>>>> (without diacritic) then it returns the same results.
>>>>>>>
>>>>>>> Has anyone set up something similar?
>>>>>>>
>>>>>> ISOLatin1AccentFilter
>>>>>>
>>>>>> --
>>>>>> karl
>>>>>> ------------------------------------------------------------------
>>>>>> ---
>>>>>> To unsubscribe, e-mail: [EMAIL PROTECTED]
>>>>>> For additional commands, e-mail: [EMAIL PROTECTED]
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --------------------------------------------------------------------
>>>> -
>>>> To unsubscribe, e-mail: [EMAIL PROTECTED]
>>>> For additional commands, e-mail: [EMAIL PROTECTED]
>>>>
>>>>
>>>>
>>>
>>>
>>
>> --
>> View this message in context: http://www.nabble.com/Searching-
>> Diacritics-tf4335454.html#a12354962
>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [EMAIL PROTECTED]
>> For additional commands, e-mail: [EMAIL PROTECTED]
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>
>
--
View this message in context:
http://www.nabble.com/Searching-Diacritics-tf4335454.html#a12366572
Sent from the Lucene - Java Users mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]