[Bug 5497] Bayes has become unusable

bugzilla-daemon Thu, 07 Jun 2007 21:46:42 -0700

http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5497






------- Additional Comments From [EMAIL PROTECTED]  2007-06-07 21:46 -------
Changing from -1.0 to -0.1 might be useful, however the stock spamassassin
ruleset contains very few rules that are negative-scoring to begin with, and
even fewer that are between these two levels.

Ignoring bayes (as it's ignored) the only extra rules you're adding that can
cause autolearning are:

score RCVD_IN_IADB_OPTIN_GT50 0 -0.499 0 -0.245
score RCVD_IN_BSP_OTHER 0 -0.1 0 -0.1
score HABEAS_CHECKED 0 -0.2 0 -0.2

And if you've got hashcash on:
score HASHCASH_20 -0.500
score HASHCASH_21 -0.700


It's a start, but I think a better long-term solution would be to do what I've
been doing on my server for quite a while. Use a very small negative score as a
threshold (-0.001) and introduce several "nice" rules with these small negative
scores.

Since the rule scores are small, you can't get the historical problem that
caused most of the negative-scoring comp rules to be wiped out. In that,
spammers crafted their emails to rack up large numbers of these rules, and
effectively whitelist the message. Here the scores are too small, you could rack
up 20 of em and only get -0.02 for your efforts.

Spammers could still do the same thing to make their messages "qualify" for
autolearning, but they'd also have to avoid all the spam rules.

The old system of having a small positive score had the problem that nonspam
autolearning was more or less "by default", as long as you didn't hit any spam
rules. This meant that some new variant spams wound up being autolearned as 
nonspam.

My suggestion here still has the same basic problem, but it at least adds some
hoops to jump through in order to qualify.

 The biggest problem here would be crafting rules that would at least be
somewhat difficult for spammers to arbitrarily add to their messages. My rules
don't meet this, as they rely on being "secret" to avoid detection, and are
largely based on "industry keywords" for my company.

Actually, That inspired me, why not make it a user-configured "goodwords" file?
Then they could add words related to *their* company and/or personal interests.
A plugin could scan for any of these words and trigger a single -0.001 scoring
rule. We'd just have to pick a good name to avoid people
thinking it was a whitelist system :)










------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 5497] Bayes has become unusable

Reply via email to