[Bug 5270] 3.2.0 rescoring

bugzilla-daemon Sat, 17 Feb 2007 10:11:56 -0800

http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5270






------- Additional Comments From [EMAIL PROTECTED]  2007-02-17 10:11 -------
GA results for set 3:

# SUMMARY for threshold 5.0:
# Correctly non-spam:  67494  99.92%
# Correctly spam:     117606  98.76%
# False positives:        56  0.08%
# False negatives:      1477  1.24%
# TCR(l=50): 27.842647  SpamRecall: 98.760%  SpamPrec: 99.952%

beats the perceptron's 1.70% FNs nicely ;)
gen-set3-5.0-5.0-100-ga is the dir.

(in passing, I used my $400 Dell laptop to produce set 1 last night.
it completed the GA run a lot faster than the zone did.  The zone
doesn't really provide decent CPU power any more.)

so, that's the lot (finally!).  to summarise, the results on the test sets are:

set 0
# Correctly non-spam:  66964  99.13%
# Correctly spam:     110426  92.73%
# False positives:       586  0.87%
# False negatives:      8657  7.27%
# TCR(l=50): 3.137313  SpamRecall: 92.730%  SpamPrec: 99.472%

set 1
# Correctly non-spam:  67347  99.70%
# Correctly spam:     114907  96.49%
# False positives:       203  0.30%
# False negatives:      4176  3.51%
# TCR(l=50): 8.312369  SpamRecall: 96.493%  SpamPrec: 99.824%

set 2
# Correctly non-spam:  67498  99.92%
# Correctly spam:     115160  96.71%
# False positives:        52  0.08%
# False negatives:      3923  3.29%
# TCR(l=50): 18.255864  SpamRecall: 96.706%  SpamPrec: 99.955%

set 3
# Correctly non-spam:  67494  99.92%
# Correctly spam:     117606  98.76%
# False positives:        56  0.08%
# False negatives:      1477  1.24%
# TCR(l=50): 27.842647  SpamRecall: 98.760%  SpamPrec: 99.952%

please take a look at 50_scores.cf and see if you can spot any
issues.

The one thing I can see is that we now have lint failures in
trunk, because there are scores in 50_scores.cf for rules
from rulesrc.   I'm not sure how to solve that... either:

- (a) stop "score set for unknown rule name" being a lint error that is warned
  about, or

- (b) go through the rulesrc tree, finding the rules that were in the active
  list and therefore which now have scores, and mark them with "tflags publish"
  so they are always published to the active ruleset.

I'm leaning towards (b).



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 5270] 3.2.0 rescoring

Reply via email to