On 03/02/11 06:35, Jizba, Richard wrote:
> Tim,
>
> Thanks for the response.
> The first option isn't an option for because we need to search for
> numbers.
> I did see something like the second option, which said to basically
> comment out the PorterStemFilter.
> So my question is, can I eliminate that level of stemming all together.
> This is what I want to do:
>
> ===========================================
> public final TokenStream tokenStream(String fieldName, final Reader
> reader)
> {
> TokenStream result = new DSTokenizer(reader);
> result = new StandardFilter(result);
> result = new LowerCaseFilter(result);
> result = new StopFilter(result, stopSet);
> /* result = new PorterStemFilter(result); */
> return result;
> }
> ============================================
>
> Will this 'break' anything?
>
> As I understand it, DSpace will then use the DSAnalyser, parse the
> character data into words, convert them to lower case and index the
> terms excluding the stop list.
There is a nice exemplar patch for how to do these kinds of things right at:
http://www.mail-archive.com/[email protected]/msg00378.html
If you made such a patch and contributed it, not only would it fix the
problem for you, but rather than re-fix future releases it would be a
simple config change.
A more comprehensive fix might be to build parallel indexes, one stemmed
and stop-worded and one unstemmed and unstop-worded. The indexes would
take up twice the disk space, but I think not too many people are
worried about index disk space these days.
> If anybody is still with me, I would be curious if there is a
> LowerCaseFilter that would permit the retention of capital 'A's.
> Eliminating 'A's in medical research databases is a problem. Vitamin A
> is the obvious example, but there are many other occurrences of 'A' as
> an important, non-trivial term in a name.
A simple question, but there are complexities here you appear not to
have thought of.
cheers
stuart
--
Stuart Yeates
Library Technology Services http://www.victoria.ac.nz/library/
------------------------------------------------------------------------------
Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)!
Finally, a world-class log management solution at an even better price-free!
Download using promo code Free_Logger_4_Dev2Dev. Offer expires
February 28th, so secure your free ArcSight Logger TODAY!
http://p.sf.net/sfu/arcsight-sfd2d
_______________________________________________
DSpace-tech mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-tech