Beside Scott's monthly stats showing up which test can catch more spam I
wondered what each single test can contribute to catch as many spam as
possible by having as few false positives as possible. (on a MTA processing
legit messages from over 1000 mailboxes)

The calculation is based on the assumtion that the weighting system on our
server will catch over 97% of spam by having around 0.01% of false
positives. (we review all hold spam between 100 and 200% of our hold weight
and keep note of every requeued legit message)

So by parsing the logfiles I assume that the final weight is "correct" and
so I know if a message is spam or legit. Now I look to the individual result
of each test and if he has counted in the "right" direction.

For example
Final weight: 120 points   => it's spam
BASE64: 10 points          => right result
SPAMCOP: 10 points         => right result 
NOLEGITCONTENT: -5 points  => wrong result
...

The result is a table with 4 values for each single test:

Dark green:  right result for spam message
Light green: right result for legit message
Dark red:    wrong result for spam message
Light red:   wrong result for legit message

Beside the absolute numbers I've created also a diagram with relative values
showing also for how much messages the test hasn't returned any result
(grey).

You can find the results on www.zcom.it/decludeupdater/spam_stats.htm

Notes: 
1.) Most tests per design can return only positive or only negative results.
But there are also tests that can return both positive (voting for spam) and
negative (voting for legit) results. So for example a IP4R usualy has (or
should have) a positive result for spam and no result for legit messages. So
it can't vote right for a legit message or wrong for a spam message.

2.) At the first moment the table maybe is a litle bit confusing. Mouseover
the relative bars will show a short explanation.

3.) Briefly: the more green you can see the bether it is. Red is bad.

4.) If you can't see any red bar in the relative values note that this means
that there are not enough false positives to show at least 1% in the
diagram. Maybe you can see some few false positives in the absolute numbers.
Not very much tests are completely free of false positives like John
Tolmachoff's AUTOWHITE. (the only FP was caused by a spam-test message
containing a lot of tipical spam keywords)

5.) Based on my assumtion that the final weight is right it can happen that
one or more tests are voting "right" but the final weight is not correct
(spam going trough the filters or legit message hold as false positive) In
this case the tests with the right vote will earn a count for the red
values. But as I know that we have already a well balanced weighting system
this wrong counts should be very rare.

Any comments or suggestions are welcome!
Hope this helps and you can understand my "english" :-)

Markus



To Unsubscribe: http://www.ipswitch.com/support/mailing-lists.html
List Archive: http://www.mail-archive.com/imail_forum%40list.ipswitch.com/
Knowledge Base/FAQ: http://www.ipswitch.com/support/IMail/

Reply via email to