https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6155
--- Comment #90 from Mark Martinec <[email protected]> 2009-10-09 06:49:27 PDT --- To assess the quality and repeatability of results, here are the summaries on all four score sets, each pair consists of a normal run on 90% of entries, and a test run on remaining 10% of log entries. The most interesting figures are the FP and FN percents, e.g. 0.028% and 0.961%, in this clipping: # False positives: 65 0.011% (0.028% of nonspam, 10580 weighted) # False negatives: 3411 0.578% (0.961% of spam, 12054 weighted) ========================================== gen-set0-5-5.0-25000-ga SCORESET 0 : (no net, not bayes) test (10%): # SUMMARY for threshold 5.0: # Correctly non-spam: 45335 98.03% # Correctly spam: 39320 81.61% # False positives: 913 1.97% # False negatives: 8860 18.39% # TCR(l=50): 0.883875 SpamRecall: 81.611% SpamPrec: 97.731% scores (90%): # SUMMARY for threshold 5.0: # Correctly non-spam: 365397 48.193% (98.401% of non-spam corpus) # Correctly spam: 314466 41.476% (81.286% of spam corpus) # False positives: 5936 0.783% (1.599% of nonspam, 173347 weighted) # False negatives: 72396 9.548% (18.714% of spam, 226867 weighted) # Average score for spam: 10.0 nonspam: 1.4 # Average for false-pos: 5.6 false-neg: 3.1 # TOTAL: 758195 100.00% ========================================== gen-set1-10-5.0-30000-ga SCORESET 1: (net, no bayes) test: # SUMMARY for threshold 5.0: # Correctly non-spam: 46183 99.86% # Correctly spam: 46648 96.82% # False positives: 65 0.14% # False negatives: 1532 3.18% # TCR(l=50): 10.075282 SpamRecall: 96.820% SpamPrec: 99.861% scores: # SUMMARY for threshold 5.0: # Correctly non-spam: 370804 48.906% (99.858% of non-spam corpus) # Correctly spam: 374579 49.404% (96.825% of spam corpus) # False positives: 529 0.070% (0.142% of nonspam, 31804 weighted) # False negatives: 12283 1.620% (3.175% of spam, 39385 weighted) # Average score for spam: 17.4 nonspam: 0.4 # Average for false-pos: 5.8 false-neg: 3.2 # TOTAL: 758195 100.00% ========================================== gen-set2-10-5.0-30000-ga SCORESET 2: (no net, bayes) test: # SUMMARY for threshold 5.0: # Correctly non-spam: 29308 99.78% # Correctly spam: 42344 95.69% # False positives: 64 0.22% # False negatives: 1907 4.31% # TCR(l=50): 8.664774 SpamRecall: 95.690% SpamPrec: 99.849% scores: # SUMMARY for threshold 5.0: # Correctly non-spam: 234375 39.745% (99.864% of non-spam corpus) # Correctly spam: 339736 57.612% (95.700% of spam corpus) # False positives: 320 0.054% (0.136% of nonspam, 26164 weighted) # False negatives: 15265 2.589% (4.300% of spam, 58794 weighted) # Average score for spam: 10.4 nonspam: 0.6 # Average for false-pos: 5.4 false-neg: 3.9 # TOTAL: 589696 100.00% ========================================== gen-set3-20-5.0-20000-ga SCORESET 3: (net, bayes) test: # SUMMARY for threshold 5.0: # Correctly non-spam: 29342 99.90% # Correctly spam: 43843 99.08% # False positives: 30 0.10% # False negatives: 408 0.92% # TCR(l=50): 23.192348 SpamRecall: 99.078% SpamPrec: 99.932% scores: # SUMMARY for threshold 5.0: # Correctly non-spam: 234630 39.788% (99.972% of non-spam corpus) # Correctly spam: 351590 59.622% (99.039% of spam corpus) # False positives: 65 0.011% (0.028% of nonspam, 10580 weighted) # False negatives: 3411 0.578% (0.961% of spam, 12054 weighted) # Average score for spam: 18.5 nonspam: -0.1 # Average for false-pos: 5.4 false-neg: 3.5 # TOTAL: 589696 100.00% -- Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug.
