On Wed, Feb 22, 2006 at 01:49:30PM +0000, Justin Mason wrote:
> a 20% FP rate, however, looks like something is buggy with the
> measurement/masses scripts rather than the scores.

Hrm, perhaps.  I was thinking that having all of the new rules at a score
of 1 is going to lead to FPs pretty quickly, depending on the corpus
involved.  Looking at my own nightly run stats, only 120 of 23298 hams
(0.51%) hit a score of 5 or above.  Output from the same script:

# SUMMARY for threshold 5.0:
# Correctly non-spam:  23189  99.53%
# Correctly spam:     107918  68.14%
# False positives:       109  0.47%
# False negatives:     50462  31.86%
# TCR(l=50): 2.832666  SpamRecall: 68.139%  SpamPrec: 99.899%

I was noticing that there are quite a number of rules listed as being ignored
which probably shouldn't be:

[...]
ignoring 'EXCUSE_23': score and range == 0
ignoring 'FUZZY_MILF': score and range == 0
ignoring 'HIDE_WIN_STATUS': score and range == 0
[...]

Hrm.  I also just noticed that we have a bunch of 0-score rules in 3.2 which
isn't good.  /me fixes

-- 
Randomly Generated Tagline:
"A state of war is not a blank check for the president when it comes to
 the rights of the nation's citizens."
         - Justice Sandra Day O'Connor, 2004-06-28

Attachment: pgpK90FJDWdJR.pgp
Description: PGP signature

Reply via email to