https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7720

            Bug ID: 7720
           Summary: Bayes plugin uses English specific stop words
           Product: Spamassassin
           Version: 3.4 SVN branch
          Hardware: PC
                OS: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Plugins
          Assignee: [email protected]
          Reporter: [email protected]
  Target Milestone: Undefined

When the Bayes plugin tokenizes the message, it ignores words with length<3
along with a list of commonly occurring words which lie in the gray area ( do
not affect the spam detection process ) called stop words. In my understanding,
this is mainly done for computation speedup and storage issues. But if a user's
primary language is not English for eg Spanish/French, the presence of a mail
with English stop words is a big indication for spam, hence for these users, if
the removal of stop words is made configurable, it would be helpful.

-- 
You are receiving this mail because:
You are the assignee for the bug.

Reply via email to