[EMAIL PROTECTED] wrote:

Dear all,

for my taste the stopwords included in Lucene (e.g.
StopAnalyzer.ENGLISH_STOP_WORDS, wich is usually used
with the SnowballAnalyzer - and I guess also with the
StandardAnalyzer) is not strict enough:

For example in a sentence with "we need ..." I would
consider "we" and "need" as stopwords but they are not
stripped by SnowballAnalyzer or StandardAnalyzer.


Now:
Is there an in-built solution to use more restrictive
stripping or do I better create my own analyzer in that
case with a more restrictive stopword list ?

If so - are you aware of more rigid lists ? (a URI
would be great !)

Have you seen this:


http://www.onjava.com/onjava/2003/01/15/examples/EnglishStopWords.txt

Though personally I would start with the default assumption that stop word lists are not needed at all unless you can "prove" you need it e.g.
[1] the indexes are too big (though in theory this shouldn't happen because of stop words..)
[2] you're doing some index analysis where you traverse terms and there are just too many





Thanks,


Holger

___________________________________________________
The ALL NEW CS2000 from CompuServe
 Better!  Faster! More Powerful!
 250 FREE hours! Sign-on Now!
 http://www.compuserve.com/trycsrv/cs2000/webmail/





---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to