[Bug 3821] scores are overoptimized for training set

bugzilla-daemon 27 Sep 2004 21:51:30 -0000

http://bugzilla.spamassassin.org/show_bug.cgi?id=3821






------- Additional Comments From [EMAIL PROTECTED]  2004-09-27 14:51 -------
Subject: Re:  scores are overoptimized for training set

Hi Dimitris,

The scores were generated from a sample of over 850000 e-mails submitted 
by multiple users.  The likely reason why BAYES_xx was scored so low was 
that due to Bayes busting, it is not as effective as it once was.  If 
you feel that the BAYES_xx scores are too low for you then you should 
increase them.

I'm not going to address your theory about how spam e-mails don't hit 
multiple rules.  I'd suggest running mass-check on your own corpus or 
examining the mass-check logs in the submit directory on the rsync 
server.  If you find that they are that different, it is fairly easy to 
run the score optimizer in order to personalise your scores.

Henry

>------- Additional Comments From [EMAIL PROTECTED]  2004-09-27 08:15 -------
>Henry,
>
>In theory, your ideas are good. Though in practice they are not as effective.
>Allow me to explain my point of view.
>
>Your theory is based on the axiom that "a spam email will hit multiple rules".
>Thus the your genetic algorithm generates scores that are "as large as possible
>without causing unnecessary false positives".
>
>Unfortunately, in practice, spam emails may not hit multiple scores. In fact,
>many many many times a spam email will only hit a BAYES_xx rule and nothing
>else. Contributing factors are: 
>
>- spammer uses a new location to send spam (most network tests are not 
>effective)
>- spammer uses an image to advertise the product (even network tests can't
>extract URLs)
>- spammer has a carefuly written email without common mistakes
>- spammer uses a non-english language (most body rules are english specific)
>
>My experience has shown that BAYES_xx is the only thing that saves us from 
>these
>kind of spam from passing through. With the current BAYES_99 low score, all
>these emails started passing through our system. Sure, they aren't many, but
>this problem did not exist in SA 2.6x and it makes it look like our upgrade
>wasn't that good.
>
>Please don't take me the wrong way, the developers of SA have done a superb job
>and i'm not criticising their decisions. Just pointing out a specific case that
>the algorithm does not take into consideration.
>
>Thank you.
>
>
>
>------- You are receiving this mail because: -------
>You are the assignee for the bug, or are watching the assignee.
>  
>





------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 3821] scores are overoptimized for training set

Reply via email to