Dear all,
for my taste the stopwords included in Lucene (e.g. StopAnalyzer.ENGLISH_STOP_WORDS, wich is usually used with the SnowballAnalyzer - and I guess also with the StandardAnalyzer) is not strict enough:
For example in a sentence with "we need ..." I would
consider "we" and "need" as stopwords but they are not
stripped by SnowballAnalyzer or StandardAnalyzer.
Now: Is there an in-built solution to use more restrictive stripping or do I better create my own analyzer in that case with a more restrictive stopword list ?
If so - are you aware of more rigid lists ? (a URI would be great !)
Have you seen this:
http://www.onjava.com/onjava/2003/01/15/examples/EnglishStopWords.txt
Though personally I would start with the default assumption that stop word lists are not needed at all unless you can "prove" you need it e.g.
[1] the indexes are too big (though in theory this shouldn't happen because of stop words..)
[2] you're doing some index analysis where you traverse terms and there are just too many
Thanks,
Holger
___________________________________________________ The ALL NEW CS2000 from CompuServe Better! Faster! More Powerful! 250 FREE hours! Sign-on Now! http://www.compuserve.com/trycsrv/cs2000/webmail/
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]