[Bug 5497] Bayes has become unusable

bugzilla-daemon Thu, 07 Jun 2007 01:47:58 -0700

http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5497






------- Additional Comments From [EMAIL PROTECTED]  2007-06-07 01:47 -------
> The bayes algorithms take this into account, so it
> should be compensated for just fine

It should compensate for different absolute numbers of ham vs spam in the
collection, but it can't compensate for a collection process that biases against
some class of ham. For example, consider that all Outlook Express mail that
contains embedded graphics in HTML as cid MIME objects trigger the
EXTRA_MPART_TYPE rule for 1.0 point. No ham that has that will be autolearned
and all high scoring spam that has that will. If there are any tokens that are
characteristic of that kind of mail, the effect will be to amplify the
EXTRA_MPART_TYPE FP from producing just 1 extra point to producing 1 plus a high
score from Bayes.

That's how I interpreted what is going on here. The summary describes that kind
of amplification of FPs on tokens found in MS Word generated HTML.




------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 5497] Bayes has become unusable

Reply via email to