On Wed, Feb 22, 2006 at 01:49:30PM +0000, Justin Mason wrote: > a 20% FP rate, however, looks like something is buggy with the > measurement/masses scripts rather than the scores.
Hrm, perhaps. I was thinking that having all of the new rules at a score
of 1 is going to lead to FPs pretty quickly, depending on the corpus
involved. Looking at my own nightly run stats, only 120 of 23298 hams
(0.51%) hit a score of 5 or above. Output from the same script:
# SUMMARY for threshold 5.0:
# Correctly non-spam: 23189 99.53%
# Correctly spam: 107918 68.14%
# False positives: 109 0.47%
# False negatives: 50462 31.86%
# TCR(l=50): 2.832666 SpamRecall: 68.139% SpamPrec: 99.899%
I was noticing that there are quite a number of rules listed as being ignored
which probably shouldn't be:
[...]
ignoring 'EXCUSE_23': score and range == 0
ignoring 'FUZZY_MILF': score and range == 0
ignoring 'HIDE_WIN_STATUS': score and range == 0
[...]
Hrm. I also just noticed that we have a bunch of 0-score rules in 3.2 which
isn't good. /me fixes
--
Randomly Generated Tagline:
"A state of war is not a blank check for the president when it comes to
the rights of the nation's citizens."
- Justice Sandra Day O'Connor, 2004-06-28
pgpK90FJDWdJR.pgp
Description: PGP signature
