https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6155

Adam Katz <[email protected]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Attachment #4564|0                           |1
        is obsolete|                            |

--- Comment #153 from Adam Katz <[email protected]> 2009-11-09 15:40:31 UTC 
---
Created an attachment (id=4568)
 --> (https://issues.apache.org/SpamAssassin/attachment.cgi?id=4568)
Checker for rules that match more ham than spam

Collected selections from several more runs of my script.  I took the last
three days' worth of masschecks plus the run last week, hand-picked rules with
a high score (~1.0+) but low S/O (~0.250-), and then looked for repeat
offenders.  This is the list, with each rule's worst S/O of any run:

 S/O RANK HAM%    SPAM%   Score attachment 4565 Rule
.002 .14  1.2650  0.0024  0.001 0.001 0.131 0.700  TVD_RCVD_SPACE_BRACKET
.002 .23  0.4472  0.0008  0.000 2.099 0.001 1.711  MISSING_MIME_HB_SEP
.019 .22  0.2529  0.0049  1.482 0.855 2.399 2.399  FUZZY_CPILL
.019 .29  0.2809  0.0056  0.001 1.699 1.498 1.699  X_IP
.046 .22  0.4010  0.0193  2.385 0.345 0.998 2.503  FRT_SOMA2
.077 .25  0.2643  0.0221  0.551 1.026 1.033 1.250  CTYPE_001C_B
.092 .21  0.8712  0.0878  0.699 0.332 0.480 0.800  MIME_BASE64_BLANKS
.095 .31  0.2735  0.0286  2.200 2.199 0.540 2.199  WEIRD_QUOTING
.178 .28  0.4948  0.1069  0 0.973 0 2.385          SPF_HELO_FAIL
.195 .29  0.8975  0.2173  1.799 0.579 0.901 0.882  HTML_IMAGE_RATIO_06
.241 .34  1.4248  0.4529  1.0                      EXTRA_MPART_TYPE

I don't think it wise to release with these scores quite so high.  I propose we
score them all 0.1 or 0.001 so as to not hold up the release and bookmark the
issue (likely a bug in the GA, probably best registered as its own bugzilla
bug) for dealing with later.


Additionally, I've updated my script to do the reverse - seek out negatively
scored rules that hit more spam than ham.  This doesn't currently find anything
beyond SPF_PASS (due to having >=1% spam hits, while it was previously found
for having ham>spam), but it does prevent listing SPF_HELO_PASS and
theoretically will help find poorly-written ham rules in the future.

-- 
Configure bugmail: 
https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

Reply via email to