https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6155

--- Comment #156 from Warren Togami <[email protected]> 2009-11-11 15:42:49 
UTC ---
I might have to eat my words.  Applying these new scores did not improve my own
statistics.

ORIGINAL SCORES
./fp-fn-statistics  -s 3 (wt-* 20091107 weekly logs)

# SUMMARY for threshold 5.0:
# Correctly non-spam:  29677  99.82%
# Correctly spam:      21106  90.42%
# False positives:        54  0.18%
# False negatives:      2235  9.58%
# TCR(l=50): 4.729686  SpamRecall: 90.425%  SpamPrec: 99.745%

https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6155#c146
GA SCORES
./fp-fn-statistics  -s 3 (wt-* 20091107 weekly logs)

# SUMMARY for threshold 5.0:
# Correctly non-spam:  29624  99.64%
# Correctly spam:      21039  90.14%
# False positives:       107  0.36%
# False negatives:      2302  9.86%
# TCR(l=50): 3.050314  SpamRecall: 90.138%  SpamPrec: 99.494%

(In reply to comment #153)
> Created an attachment (id=4568)
 --> (https://issues.apache.org/SpamAssassin/attachment.cgi?id=4568) [details]
> Checker for rules that match more ham than spam
> 
> Collected selections from several more runs of my script.  I took the last
> three days' worth of masschecks plus the run last week, hand-picked rules with
> a high score (~1.0+) but low S/O (~0.250-), and then looked for repeat
> offenders.  This is the list, with each rule's worst S/O of any run:
> 
>  S/O RANK HAM%    SPAM%   Score attachment 4565 [details] Rule
> .195 .29  0.8975  0.2173  1.799 0.579 0.901 0.882  HTML_IMAGE_RATIO_06

score HTML_IMAGE_RATIO_02 2.199 0.805 1.200 0.437
score HTML_IMAGE_RATIO_04 2.089 0.610 0.607 0.556
score HTML_IMAGE_RATIO_06 1.799 0.579 0.901 0.882
score HTML_IMAGE_RATIO_08 1.410 0.351 0.874 0.021

Is it logical to zero out HTML_IMAGE_RATIO_06 when these others have scores? 
It feels like either our corpus sample size was not large and varied enough, or
we are doing something else wrong.  These particular rules had scores much
lower from the 3.2.0 GA.

>  S/O RANK HAM%    SPAM%   Score attachment 4565 [details] Rule
> .241 .34  1.4248  0.4529  1.0                      EXTRA_MPART_TYPE

I suppose this is the clearest case of a rule we should zero out.

-- 
Configure bugmail: 
https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

Reply via email to