On 03/02/11 06:35, Jizba, Richard wrote:
> Tim,
>
> Thanks for the response.
> The first option isn't an option for because we need to search for
> numbers.
> I did see something like the second option, which said to basically
> comment out the PorterStemFilter.
> So my question is, can I eliminate that level of stemming all together.
> This is what I want to do:
>
>   ===========================================
>      public final TokenStream tokenStream(String fieldName, final Reader
> reader)
>      {
>          TokenStream result = new DSTokenizer(reader);
>          result = new StandardFilter(result);
>          result = new LowerCaseFilter(result);
>          result = new StopFilter(result, stopSet);
>      /*    result = new PorterStemFilter(result); */
>          return result;
>      }
>   ============================================
>
> Will this 'break' anything?
>
> As I understand it, DSpace will then use the DSAnalyser, parse the
> character data into words, convert them to lower case and index the
> terms excluding the stop list.

There is a nice exemplar patch for how to do these kinds of things right at:

http://www.mail-archive.com/[email protected]/msg00378.html

If you made such a patch and contributed it, not only would it fix the 
problem for you, but rather than re-fix future releases it would be a 
simple config change.

A more comprehensive fix might be to build parallel indexes, one stemmed 
and stop-worded and one unstemmed and unstop-worded. The indexes would 
take up twice the disk space, but I think not too many people are 
worried about index disk space these days.

> If anybody is still with me, I would be curious if there is a
> LowerCaseFilter that would permit the retention of capital 'A's.
> Eliminating 'A's in medical research databases is a problem. Vitamin A
> is the obvious example, but there are many other occurrences of 'A' as
> an important, non-trivial term in a name.

A simple question, but there are complexities here you appear not to 
have thought of.

cheers
stuart
-- 
Stuart Yeates
Library Technology Services http://www.victoria.ac.nz/library/

------------------------------------------------------------------------------
Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)!
Finally, a world-class log management solution at an even better price-free!
Download using promo code Free_Logger_4_Dev2Dev. Offer expires 
February 28th, so secure your free ArcSight Logger TODAY! 
http://p.sf.net/sfu/arcsight-sfd2d
_______________________________________________
DSpace-tech mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-tech

Reply via email to