http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5497
Summary: Bayes has become unusable
Product: Spamassassin
Version: 3.2.0
Platform: Other
OS/Version: other
Status: NEW
Severity: normal
Priority: P5
Component: Learner
AssignedTo: [email protected]
ReportedBy: [EMAIL PROTECTED]
I've have great success with Bayes until very recently, coinciding with the
upgrade to 3.2.0. Our users have begin to get a completely unacceptable False
Positive rate. No other changes have really been made, except for the upgrade to
3.2.0.
I have attempted to truncate my bayes database and re-learn it (it's being
stored in a MySQL database). Over only a weekend's time (and with no end-user
input, only auto-learning input), the false postives were back, nearly always
caused by a large amount of ham being tagged with BAYES_99.
I also dropped the tables and recreated them from the provided .sql script, but
it doesn't appear to have made a difference.
Because of the end-user impact at my site, I've resorted to doing:
score BAYES_50 0
score BAYES_60 0
score BAYES_80 0
score BAYES_95 0
score BAYES_99 0
to correct the false positive rate.
I'm finding that several of the false positives are from Outlook 200x clients
using Microsoft Word as a (crappy) HTML generator... and because both ham and
spam share the same MSWord body content, many autolearned tokens are being input
as spam. I'm not sure if this specific platform has been a substantial cause of
my recent issues or not, just something I thought I'd mentioned.
Has any of the bayes code been modified with 3.2.0? Has the learning grace
period been shortened? Has the autolearn code been modified? Has the Internet
trended in such a way that bayes poisoning is more common than it was a few
weeks ago? If so, would you consider lowering the point values for these rules
since they're growing less effective?
Or am I totally nuts and it's only my site that's having a substantially harder
time with bayes accuracy?
Thanks.
P.S. I did mail the users mailing list before reporting this problem here.
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.