* KELEMEN Peter ([EMAIL PROTECTED]) [20030702 16:58]:

[ Please Cc: me on followup since I'm no longer subscribed thanks. ]

> From the results, it is clear to me that Bayesian spam filtering
> alone is still not good enough to catch most of spam.  If time
> permits, I'll look into CRM114 and others.

Since the infamous SpamAssassin/Osirusoft incident, I switched
over bogofilter (starting with 0.14.5.2, regularly updated until
0.16.2) with no preliminary training.  I chose to train it on my
regular mail inflow with the burden of having a lot of spam in the
couple of first days.  Well, I have to say I'm impressed.  Let the
numbers speak for themselves:

Sampling period:        2003/08/27 -- 2003/10/27 (8 weeks)
Total incoming mails:   37822 (100.00%)
Total incoming ham:     31706 ( 83.83%)
Total incoming spam:    6116  ( 16.17%)

Number of spam:         6116  (100.00%)
Spam caught:            5439  ( 88.93%)
False negatives:        677   ( 11.07%)
False positives:        3     (  0.50%)



Production period:      2003/10/27 -- 2004/01/14 (10 weeks)
Total incoming mails:   66967 (100.00%)
Total incoming ham:     55246 ( 82.50%)
Total incoming spam:    11721 ( 17.50%)

Number of spam:         11721 (100.00%)
Spam caught:            11163 ( 95.24%)
False negatives:        561   (  4.79%)
False positives:        3     (  0.03%)


This supports my "theory" that my bogofilter tests done before
(while still using SpamAssassin in production) was flawed because
I trained it with a lot of *old* spam, that skewed the values in
the wrong direction.

Peter (now a happy bogofilter user)

-- 
    .+'''+.         .+'''+.         .+'''+.         .+'''+.         .+''
 Kelemen P�ter     /       \       /       \       /    [EMAIL PROTECTED]
.+'         `+...+'         `+...+'         `+...+'         `+...+'


Reply via email to