[Bug 3821] scores are overoptimized for training set

bugzilla-daemon 27 Sep 2004 15:16:13 -0000

http://bugzilla.spamassassin.org/show_bug.cgi?id=3821






------- Additional Comments From [EMAIL PROTECTED]  2004-09-27 08:15 -------
Henry,

In theory, your ideas are good. Though in practice they are not as effective.
Allow me to explain my point of view.

Your theory is based on the axiom that "a spam email will hit multiple rules".
Thus the your genetic algorithm generates scores that are "as large as possible
without causing unnecessary false positives".

Unfortunately, in practice, spam emails may not hit multiple scores. In fact,
many many many times a spam email will only hit a BAYES_xx rule and nothing
else. Contributing factors are: 

- spammer uses a new location to send spam (most network tests are not 
effective)
- spammer uses an image to advertise the product (even network tests can't
extract URLs)
- spammer has a carefuly written email without common mistakes
- spammer uses a non-english language (most body rules are english specific)

My experience has shown that BAYES_xx is the only thing that saves us from these
kind of spam from passing through. With the current BAYES_99 low score, all
these emails started passing through our system. Sure, they aren't many, but
this problem did not exist in SA 2.6x and it makes it look like our upgrade
wasn't that good.

Please don't take me the wrong way, the developers of SA have done a superb job
and i'm not criticising their decisions. Just pointing out a specific case that
the algorithm does not take into consideration.

Thank you.



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 3821] scores are overoptimized for training set

Reply via email to