http://bugzilla.spamassassin.org/show_bug.cgi?id=4095
Summary: Ising Bayesian Filters to score rules
Product: Spamassassin
Version: unspecified
Platform: Other
OS/Version: other
Status: NEW
Severity: normal
Priority: P5
Component: Learner
AssignedTo: [email protected]
ReportedBy: [EMAIL PROTECTED]
I want to throw out a thought. I think we can get rid of scores for rules and
let a bayesian filter do automatic scoring.
Here's how it would work. We keep the rules with an indication as to if the rule
is initially a black or white rule. As the initial messages come in they are
evaluated against the rules and a list of triggered rules are fed into a
SEPARATE bayesian filter that is only used to score rules. If the message is
sufficiently extreme ham or spam then the message is autolearned by ALL the
bayesian filters.
Once the system is trained then the bayesian filter for the rules is what
generates the score.
We also have to rethink the idea of scores because scores will be a fraction
between 0 and 1 instead of points that are added or subtracted. The result is
not a yes or no but rather a fraction that indicates how spammy or hammy the
message is. This result can be used to decide what to do with the message.
On my system I have got away from the "this is spam" model. I have many
classifications.
ham - autolearned
nonspam - not spam - but not sure enough to autolearn
low-spam - this is probably spam - but a few false positives end up here
high-spam - these messages are bounce to the sender - autolearned
veryhigh-spam - these message are just dropped so as not to become bounce spam -
autolearned
pure-trash - I drop these at connect time
The idea is that the scoring of these rules are automatic based on the
reliability of the hits on these rules - and - the score varies from server to
server based on the kind of spam and ham received. After the filter is trained
you can write any rule you want and if you write a good rule - it will develop a
good score. Rules that score in the middle can be automatically culled.
This bayesian filter is separate and apart from the other bayesian filters. The
other bayesian filters report to this filter with their (fractional) results and
are also automatically evaluated.
I'm getting lose to 99.9% accuracy with the tricks I'm doing so far. This one
can really kisk the accuracy up there - but it requires a significal shift in
the way you think about spam and scoring.
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.