RE: [IMail Forum] February 2003 Spam Statistics

Markus Gufler Tue, 04 Mar 2003 16:34:31 -0800

> 
> WEIGHT10         99.63%       [FP:~1%]
> WEIGHT20         98.48%       [FP:~5%]
> SPAMCHK          95.50%       [FP:~50%]
> SNIFFER          94.98%       [FP:~2%]
> CYBERSITTER      94.06%       [FP:~5%]
>
> ...
> 
> (for example, the false positive rates for SPAMCHK, SNIFFER, and 
> CYBERSITTER are disproportionately high, as they examine the 
> content of 
> E-mail, and we receive a lot of legitimate E-mail that has 
> copies of spam 
> in it).


Hi Scott,
We're very happy to see SPAMCHK already on first place. However we think
that users should also know how SPAMCHK returns his result to the
Declude weighting system. This can also explain why SPAMCHK has reached
a higher value then professional solutions like SNIFFER or CYBERSITTER
and why there can be a FP rate of ~50%.

We've designed SPAMCHK to return the result of all content based tests
to declude as a weight. SPAMCHK should be configured as an external test
like

        SPAMCHK external weight "C:\spamchkpath\spamchk.exe"

So the result reported to declude is not a simple 0 "no we think this is
not spam" or 1 "yes we consider this spam"
Rather we see SPAMCHK as a little weighting system in the declude
weighting system. Our return codes can be from -255 to +255 points
(spamchk can report also greater values, but 255 is the limit for
application returncodes that spamchk can report to declude) All values
of each single test and keyword can be configured.

So what can happen:
SPAMCHK can find only a small spam attribute and report this with a
small weight (for example 5% of the hold value)
On the other side SPAMCHK is also able to return a weight thats high
enough to trigger the hold-action.
But not enough: SPAMCHK can also return a negative weight if there are
tipical legitimate mail characteristics.

We've spend some time to find tipical negative keywords. This are
Keywords that we can't share because most of them are local related
names. (names that a spammer can't know about like geoagraphical
names...) Every SPAMCHK admin must find his own set. With this negative
keywords we was able to level out a lot of FPs created by other tests
(like REVDNS, NOABUSE...)

So we mean: Both the 95,5% and also the 50% FP-rate should be seen under
this circumstances. 
On our system we was able to increase the detection rate by around 45%.
(see attached diagramm: values are in % of the hold level)

The most important thing: Every Admin must adapt his own configuration
to gather the best from SPAMCHK.
We work on a new keyword-handling, that will allow to recieve regular
updates (for example from Kami's keyword DB) but keep also his own
"personal" settings.

One last thing: We've also begun to block hoax warnings by setting
tipical phrases with a very high weight. It's still incredible how much
hoaxes flit trough the cyberspace...

Please excuse me for the long message and the poor english. 
Markus

returncode_declude_spamchk.PDF
Description: Adobe PDF document

RE: [IMail Forum] February 2003 Spam Statistics

Reply via email to