[Dspace-tech] Diacritics search results

Brian Freels-Stendel Mon, 13 Sep 2010 13:46:02 -0700

Hi All,

I'm experiencing trouble with search results.  If a word with a diacritic is 
searched with and without the diacritic, different sets of results are returned.


I've fixed it with the recommendation from 
http://www.mail-archive.com/[email protected]/msg00558.html, 
with the small update of importing the lucene ISOLatin1AccentFilter, since it 
is now part of the lucene package (see below.)

I'm concerned that this is not the best way to go about it anymore.  Should 
DSAnalizer.java be copied to the modules directory?  (I actually tried to do 
that with another customization from the dspace-api directory and it didn't 
work, but I want to make sure I just wasn't doing it right.)

We're running DSpace 1.6.2 on RedHat5 with sun-java-1.6 and tomcat 5.

TIA for any advice!

B--


Modify org.dspace.search.DSAnalizer.java:

1)  
    import org.apache.lucene.analysis.Analyzer;
+    import org.apache.lucene.analysis.ISOLatin1AccentFilter;
    import org.apache.lucene.analysis.LowerCaseFilter;

2) 
      /*
       * Create a token stream for this analyzer.
       */
      public final TokenStream tokenStream(String fieldName, final Reader
  reader)
      {
          TokenStream result = new DSTokenizer(reader);

          result = new StandardFilter  (result);
          result = new LowerCaseFilter (result);
          result = new StopFilter      (result, stopTable);
          result = new PorterStemFilter(result);
  +      result = new ISOLatin1AccentFilter(result);

          return result;
      }
3) run index-init.



------------------------------------------------------------------------------
Start uncovering the many advantages of virtual appliances
and start using them to simplify application deployment and
accelerate your shift to cloud computing
http://p.sf.net/sfu/novell-sfdev2dev
_______________________________________________
DSpace-tech mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-tech

[Dspace-tech] Diacritics search results

Reply via email to