https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7720
Bug ID: 7720
Summary: Bayes plugin uses English specific stop words
Product: Spamassassin
Version: 3.4 SVN branch
Hardware: PC
OS: Linux
Status: NEW
Severity: normal
Priority: P2
Component: Plugins
Assignee: [email protected]
Reporter: [email protected]
Target Milestone: Undefined
When the Bayes plugin tokenizes the message, it ignores words with length<3
along with a list of commonly occurring words which lie in the gray area ( do
not affect the spam detection process ) called stop words. In my understanding,
this is mainly done for computation speedup and storage issues. But if a user's
primary language is not English for eg Spanish/French, the presence of a mail
with English stop words is a big indication for spam, hence for these users, if
the removal of stop words is made configurable, it would be helpful.
--
You are receiving this mail because:
You are the assignee for the bug.