I think it's not possible to calculate the weight of an individual test strictly from his catch/failure rate.
 
On http://www.zcom.it/spamtest/ you can see what we generate from our daily logfiles.
 
In my opinion it's not enough to count wrong or right results.
 
Theoretically there are 5 possible results for every individual test
  1. correct result for a spam message
    For example SPAMCOP has a positive result for a spam message
  2. wrong result for a spam message
    For example NOLEGITCONTENT has a positive result (and so will substract points) for a spam message
  3. correct result for a legit message
    For example AUTOWHITE has a positive result (and so will substract points) for a legit message
  4. wrong result for a legit message
    For example REVDNS has a positive result for a legit message
  5. no result
    For example no line in a FILTER file matches with something in the legit or spam message
Practically most spam tests has only 3 possible results because they are counting "only" or as positive or as negative test. For example SPAMCOP can't fail on a spam message because his result is a "positive weight" or "no weight" (unless you decide to assign a negative weight if spamcop hasn't a positive result => not considered)
Another test like NOLEGITCONTENT will only substract points or if NO-LEGIT-CONTENT was found return zero as result.
 
Some tests like SPAMCHK can have a positive/negative weight or zero as result and so he can have all 5 results mentioned above.
 
On the report (link above) you can see this 5 possible results both in absolute numbers or as relative values in the diagramm:
  1. dark green
  2. dark red
  3. light green
  4. light red
  5. grey
The more green you can see, the bether a test is. The red bars indicate that this test has counted in the opposite direction as the final weight. (You can move the mouse pointer above the bar to show the percentage.)
 
If a certain test has no false positives over several days, weeks or months you can increase his weight near to your hold weight or also above. But this tests are very rare. Good tests has a good detection rate, and very few false positves. for example SPAMCOP.
 
 
My scripts, applications and the database for all this research is a work in progress and I have a lot of ideas to implement. For example I've added a report to view mail-from, -to and subject for every message where a certain test has had the wrong result. So I can see if this test if failing has some effect or can be ignored.
 
The report above shows the result for one business day. But I can also create average values for several days or weeks. The next thing I plan is to create a diagram containing the daily results for one single test. So I can see if the quality of this test changes over time (goes up, down, ...) and so the weight should be adapted.
 
Unfortunately I can't code this into a redistribuable application. My VBscripts are not very fast (would be much faster without error checking for corrupt logfile lines) and parsing trough 10 MB logfiles, analizing the individual results, saving them into a database (MS-SQL Server) and creating all necessary conjuntions takes several minutes with high CPU usage.
 
I'm sure a good programmer and compiler can code this in a small and fast application. But at the moment I see this as a research what's worth analizing and searching for.
 
 
Finaly some comments to previous posts:
  • 37% as way too much. Even if the resting 63% (not 73% Scott :-) are correct results. Remove this test!
  • Some "old" test like REVDNS or HELOBOGUS seem sto have an unexpected high rate of wrong results. I've decreased their weight since I've discovered this.
  • regaring the terminology of false positives: I agree with Dan, that a single test can't create a false positive (unless his own weight is superior then the HOLD weight) So a test failing in his result should be interpreted as "wrong result". The "False positive" is a legit message in your spamfolder. The "False negative" is a spam message in your mailbox.
Hope my "english" is not too terrible ;-)
Markus

Reply via email to