https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6155
--- Comment #156 from Warren Togami <[email protected]> 2009-11-11 15:42:49 UTC --- I might have to eat my words. Applying these new scores did not improve my own statistics. ORIGINAL SCORES ./fp-fn-statistics -s 3 (wt-* 20091107 weekly logs) # SUMMARY for threshold 5.0: # Correctly non-spam: 29677 99.82% # Correctly spam: 21106 90.42% # False positives: 54 0.18% # False negatives: 2235 9.58% # TCR(l=50): 4.729686 SpamRecall: 90.425% SpamPrec: 99.745% https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6155#c146 GA SCORES ./fp-fn-statistics -s 3 (wt-* 20091107 weekly logs) # SUMMARY for threshold 5.0: # Correctly non-spam: 29624 99.64% # Correctly spam: 21039 90.14% # False positives: 107 0.36% # False negatives: 2302 9.86% # TCR(l=50): 3.050314 SpamRecall: 90.138% SpamPrec: 99.494% (In reply to comment #153) > Created an attachment (id=4568) --> (https://issues.apache.org/SpamAssassin/attachment.cgi?id=4568) [details] > Checker for rules that match more ham than spam > > Collected selections from several more runs of my script. I took the last > three days' worth of masschecks plus the run last week, hand-picked rules with > a high score (~1.0+) but low S/O (~0.250-), and then looked for repeat > offenders. This is the list, with each rule's worst S/O of any run: > > S/O RANK HAM% SPAM% Score attachment 4565 [details] Rule > .195 .29 0.8975 0.2173 1.799 0.579 0.901 0.882 HTML_IMAGE_RATIO_06 score HTML_IMAGE_RATIO_02 2.199 0.805 1.200 0.437 score HTML_IMAGE_RATIO_04 2.089 0.610 0.607 0.556 score HTML_IMAGE_RATIO_06 1.799 0.579 0.901 0.882 score HTML_IMAGE_RATIO_08 1.410 0.351 0.874 0.021 Is it logical to zero out HTML_IMAGE_RATIO_06 when these others have scores? It feels like either our corpus sample size was not large and varied enough, or we are doing something else wrong. These particular rules had scores much lower from the 3.2.0 GA. > S/O RANK HAM% SPAM% Score attachment 4565 [details] Rule > .241 .34 1.4248 0.4529 1.0 EXTRA_MPART_TYPE I suppose this is the clearest case of a rule we should zero out. -- Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug.
