https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6155





--- Comment #7 from Justin Mason <[email protected]>  2009-08-17 15:21:17 PST ---
ok, I think I've ironed out a couple of issues.  Let's see what people think of
these sample scores:

http://taint.org/x/2009/gen-set0-2.0-5.0-500-ga_scores
http://taint.org/x/2009/gen-set1-5.0-5.0-500-ga_scores
http://taint.org/x/2009/gen-set2-2.0-5.0-500-ga_scores
http://taint.org/x/2009/gen-set3-5.0-5.0-500-ga_scores


here are the test results against the "test" fold for each scoreset:

gen-set0-2.0-5.0-500-ga/test
Reading scores from "tmprules"...
Reading per-message hit stat logs and scores...

# SUMMARY for threshold 5.0:
# Correctly non-spam:  26453  99.07%
# Correctly spam:      83369  81.53%
# False positives:       249  0.93%
# False negatives:     18882  18.47%
# TCR(l=50): 3.263469  SpamRecall: 81.534%  SpamPrec: 99.702%


gen-set1-5.0-5.0-500-ga/test
Reading scores from "tmprules"...
Reading per-message hit stat logs and scores...

# SUMMARY for threshold 5.0:
# Correctly non-spam:  26646  99.79%
# Correctly spam:     100943  98.72%
# False positives:        56  0.21%
# False negatives:      1308  1.28%
# TCR(l=50): 24.890701  SpamRecall: 98.721%  SpamPrec: 99.945%


gen-set2-2.0-5.0-500-ga/test
Reading scores from "tmprules"...
Reading per-message hit stat logs and scores...

# SUMMARY for threshold 5.0:
# Correctly non-spam:  26485  99.19%
# Correctly spam:      84218  82.36%
# False positives:       217  0.81%
# False negatives:     18033  17.64%
# TCR(l=50): 3.540179  SpamRecall: 82.364%  SpamPrec: 99.743%


gen-set3-5.0-5.0-500-ga/test
Reading scores from "tmprules"...
Reading per-message hit stat logs and scores...

# SUMMARY for threshold 5.0:
# Correctly non-spam:  26662  99.85%
# Correctly spam:     100964  98.74%
# False positives:        40  0.15%
# False negatives:      1287  1.26%
# TCR(l=50): 31.107697  SpamRecall: 98.741%  SpamPrec: 99.960%

Yes, set0 and set2 are terrible.  This is pretty much what happened last time,
too; our ruleset is pretty crappy nowadays without network rules active.  But
the net rule results are very good!  However I think I need to look into the
local rule GA runs if possible.

Bug 5270 is the 3.2.0 rescoring run, for reference.

Spamhaus will be happy to see a much improved score for RCVD_IN_PBL ;)

gen-set1-5.0-5.0-500-ga_scores:score RCVD_IN_PBL                    2.596
gen-set3-5.0-5.0-500-ga_scores:score RCVD_IN_PBL                    2.411

-- 
Configure bugmail: 
https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

Reply via email to