Many thanks to Tate, Otis, Eric [again :-)] and David. I am using the Snowball stemmer - so with the overloaded constructor for Snowball I guess a call would be:
new SnowballAnalyser("English, StopAnalyzer.MY_ENGLISH_STOP_WORDS); where MY_ENGLISH_STOP_WORDS is a java.lang.String[] of the stopwords I would like to use. Is that the correct syntax for SnowballAnalyser ? Thanks again, Holger On Thu, 22 Apr 2004 11:38:13 -0700, David Spencer wrote: > > [EMAIL PROTECTED] wrote: > > > Dear all, > > > > for my taste the stopwords included in Lucene (e.g. > > StopAnalyzer.ENGLISH_STOP_WORDS, wich is usually used > > with the SnowballAnalyzer - and I guess also with the > > StandardAnalyzer) is not strict enough: > > > > For example in a sentence with "we need ..." I would > > consider "we" and "need" as stopwords but they are not > > stripped by SnowballAnalyzer or StandardAnalyzer. > > > > Now: > > Is there an in-built solution to use more restrictive > > stripping or do I better create my own analyzer in > that > > case with a more restrictive stopword list ? > > > > If so - are you aware of more rigid lists ? (a URI > > would be great !) > > Have you seen this: > > http://www.onjava.com/onjava/2003/01/15/examples/EnglishStopWords.txt > > Though personally I would start with the default > assumption that stop > word lists are not needed at all unless you can "prove" > you need it e.g. > [1] the indexes are too big (though in theory this > shouldn't happen > because of stop words..) > [2] you're doing some index analysis where you traverse > terms and there > are just too many > > > > > > > Thanks, > > > > Holger > > > > ___________________________________________________ > > The ALL NEW CS2000 from CompuServe > > Better! Faster! More Powerful! > > 250 FREE hours! Sign-on Now! > > http://www.compuserve.com/trycsrv/cs2000/webmail/ > > > > > > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: > [EMAIL PROTECTED] > > For additional commands, e-mail: > [EMAIL PROTECTED] > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: > [EMAIL PROTECTED] > For additional commands, e-mail: > [EMAIL PROTECTED] ___________________________________________________ The ALL NEW CS2000 from CompuServe Better! Faster! More Powerful! 250 FREE hours! Sign-on Now! http://www.compuserve.com/trycsrv/cs2000/webmail/ --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]