I
think it's not possible to calculate the weight of an individual test strictly
from his catch/failure rate.
On http://www.zcom.it/spamtest/ you
can see what we generate from our daily logfiles.
In my opinion it's not enough to count wrong or right
results.
Theoretically there are 5 possible results for every
individual test
Practically most spam tests has only 3 possible results
because they are counting "only" or as positive or as negative test. For example
SPAMCOP can't fail on a spam message because his result is a "positive weight"
or "no weight" (unless you decide to assign a negative weight if spamcop hasn't
a positive result => not considered)
Another test like NOLEGITCONTENT will only substract
points or if NO-LEGIT-CONTENT was found return zero as
result.
Some tests like SPAMCHK can have a positive/negative
weight or zero as result and so he can have all 5 results mentioned
above.
On the report (link above) you can see this 5 possible
results both in absolute numbers or as relative values in the
diagramm:
The more green you can see, the bether a test is. The
red bars indicate that this test has counted in the opposite direction as the
final weight. (You can move the mouse pointer above the bar to show the
percentage.)
If a certain test has no false positives over several
days, weeks or months you can increase his weight near to your hold weight or
also above. But this tests are very rare. Good tests has a good detection rate, and
very few false positves. for example SPAMCOP.
My scripts, applications and the database for all this
research is a work in progress and I have a lot of ideas to implement. For
example I've added a report to view mail-from, -to and subject for every message
where a certain test has had the wrong result. So I can see if this test if
failing has some effect or can be ignored.
The report above shows the result for one business day.
But I can also create average values for several days or weeks. The next thing I
plan is to create a diagram containing the daily results for one single test. So
I can see if the quality of this test changes over time (goes up, down, ...) and
so the weight should be adapted.
Unfortunately I can't code this into a redistribuable
application. My VBscripts are not very fast (would be much faster without error
checking for corrupt logfile lines) and parsing trough 10 MB logfiles, analizing
the individual results, saving them into a database (MS-SQL Server) and creating
all necessary conjuntions takes several minutes with high CPU
usage.
I'm sure a good programmer and compiler can code this
in a small and fast application. But at the moment I see this as a research
what's worth analizing and searching for.
Finaly some comments to previous
posts:
Hope my "english" is not too terrible
;-)
Markus
|
- [Declude.JunkMail] Scaling Up The Declude Weighting System Dan Geiser
- RE: [Declude.JunkMail] Scaling Up The Declude Weighting... Markus Gufler
- Re: [Declude.JunkMail] Scaling Up The Declude Weigh... Dan Geiser
- [Declude.JunkMail] Span Domains file Glenn Brooks
- Re: [Declude.JunkMail] Scaling Up The Declude W... Markus Gufler
- Re: [Declude.JunkMail] Scaling Up The Declude Weighting... Todd Ryan
- RE: [Declude.JunkMail] Scaling Up The Declude Weighting... Bill
- Re: [Declude.JunkMail] Scaling Up The Declude Weighting... Scott Fisher
- Re: [Declude.JunkMail] Scaling Up The Declude Weighting... Scott Fisher