Hi Richard,
By default, DSpace uses its own custom Lucene Search analyzer
(org.dspace.search.DSAnalyzer) which does perform stemming by default.
We made this decision as most major search engines (even Google) perform
some level of stemming (though major search engines do tend to have
better ranking algorithms, than what is available in Lucene).
There's a few options available to you, if you want to disable stemming:
== First Option ==
You can switch DSpace use the "Standard" search analyzer
(org.apache.lucene.analysis.standard.StandardAnalyzer) provided by
Lucene. This standard analyzer does not perform any stemming, but (as
far as I can recall) it also doesn't allow searching on numbers.
So, using this analyzer, if you search "1000" nothing will be returned
(even if a title included the number "1000"). In addition, as stemming
is off, obviously if you search "testing", you'll only get results for
exact matches for "testing" and not matches for "test" or "tester".
Similarly, a search for "cats" will only return results for exact
matches ("cats") and not for "cat".
To switch to using the Lucene Standard Analyzer, do the following:
1) Edit your 'dspace.cfg file, and change the value of 'search.analyzer'
to the following (by default it is commented out):
search.analyzer = org.apache.lucene.analysis.standard.StandardAnalyzer
2) Stop Tomcat (taking down your DSpace instance)
3) ReIndex all content in your DSpace by running:
[dspace]/bin/dspace/ index-init
4) Start Tomcat & test
== Second Option ==
The other option is to customize the DSpace Lucene Search analyzer to no
longer do stemming. However, this will require a bit of Java
programming, so I'd only recommend this if you are comfortable with
Java, or have a Java programmer on staff. In this option, you could
create your own Lucene Search Analyzer, based on a copy of
'org.dspace.search.DSAnalyzer'. In that copy, you'd want to remove the
following line which calls the Lucene Stemming function:
result = new PorterStemFilter(result);
After that, you'd have to rebuild DSpace, and then modify the
'dspace.cfg' similar to "First Option", except obviously specify your
custom Lucene Search Analyzer.
Good luck!
- Tim
On 2/1/2011 9:02 AM, Jizba, Richard wrote:
> The default stemming algorithm is so heavy handed that it has really
> created problems for us and thinking about it over night I realized that
> there are a number of other likely problems our users haven’t
> encountered but will soon.
>
> I need a fairly complete list of steps for disabling stemming. Our
> current version of DSpace is 1.6.2 and we are planning to move to 1.7
> soon. However, this issue is serious enough that I don’t want to wait,
> or I may have a significant supporter of DSpace at my institution become
> a real critic.
>
> *Richard Jizba*
>
> Health Sciences Library
>
> Creighton University
>
> (402) 280-5142
>
> [email protected]
>
>
>
> ------------------------------------------------------------------------------
> Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)!
> Finally, a world-class log management solution at an even better price-free!
> Download using promo code Free_Logger_4_Dev2Dev. Offer expires
> February 28th, so secure your free ArcSight Logger TODAY!
> http://p.sf.net/sfu/arcsight-sfd2d
>
>
>
> _______________________________________________
> DSpace-tech mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/dspace-tech
------------------------------------------------------------------------------
Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)!
Finally, a world-class log management solution at an even better price-free!
Download using promo code Free_Logger_4_Dev2Dev. Offer expires
February 28th, so secure your free ArcSight Logger TODAY!
http://p.sf.net/sfu/arcsight-sfd2d
_______________________________________________
DSpace-tech mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-tech