https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6155

--- Comment #122 from Adam Katz <[email protected]> 2009-10-22 13:32:40 UTC 
---
Some bugs in the auto-generated rules from attachment 4553

HTML_MESSAGE scores WAY too high.  There are others too.

Full list as of right now:


   MSECS    SPAM%     HAM%     S/O    RANK   SCORE  NAME
       0   0.1848   4.8675   0.037    0.78    0.00  SPF_HELO_PASS
       0   0.3294   5.5859   0.056    0.74    0.00  SPF_PASS
       0  12.2476   1.2568   0.907    0.58    0.00  RCVD_IN_BL_SPAMCOP_NET
       0  50.4453   3.7391   0.931    0.57    2.30  MIME_HTML_ONLY
       0  49.9300  12.1231   0.805    0.52    0.10  RDNS_NONE
       0   3.8466   1.8427   0.676    0.51    2.30  SUBJ_ALL_CAPS
       0   2.3989   1.3218   0.645    0.50    0.00  UNPARSEABLE_RELAY
       0  83.7769  40.8865   0.672    0.49    0.00  HTML_MESSAGE
       0   3.4477   3.8932   0.470    0.47    2.50  MIME_QP_LONG_LINE
       0  12.2361  15.6252   0.439    0.46    0.00  FREEMAIL_FROM
       0   0.7695   1.2102   0.389    0.41    2.90  TVD_SPACE_RATIO
       0   0.4610   1.2409   0.271    0.35    1.00  EXTRA_MPART_TYPE
       0   0.0271   1.0700   0.025    0.15    1.22  MSGID_MULTIPLE_AT

score SPF_HELO_PASS -0.001
score SPF_PASS -0.001
score RCVD_IN_BL_SPAMCOP_NET 0 1.725 0 1.180 # n=2
score MIME_HTML_ONLY 1.474 0.737 0.829 0.462
score RDNS_NONE             0.1
score SUBJ_ALL_CAPS 0.264 1.568 0.593 1.045
score UNPARSEABLE_RELAY 0.001
score HTML_MESSAGE 2.199 0.838 1.473 0.511
score MIME_QP_LONG_LINE 0.074 0.242 0.116 0.002
score FREEMAIL_FROM 0.817 1.020 0.401 0.856
score TVD_SPACE_RATIO 0.001 0.201 0.398 0.001
score MSGID_MULTIPLE_AT 0.001 0.001 0.598 0.000


To fetch them for yourself (so as to get something more up-to-date or from a
different URL, etc), here's the code I ran (sorry, I know posix shell better
than perl, so I dip into both):

elinks -dump http://ruleqa.spamassassin.org/ |perl -ne 
  'print if /(\s+[\d.]+){2}\s+[1-9][\d.]+(\s+[\d.]+){3}\s+(?!T_)\w|\sMSECS/'
  |tee rules.txt

for rule in $(perl -ne 'if (/.*\s([A-Z]+\w*_\w*)/) { s//$1/; print; }'
  < rules.txt); do grep "^[^#]* $rule " /tmp/50_scores_newest.cf; done


That could probably be written better, e.g. looking for ham% > spam% in
addition to ham% > 0.9999%, but this is a good first-pass.

Obviously, /removing/ fixed scores for things like RDNS_NONE can't possibly be
considered until the GA is a little more apt at figuring this sort of thing
out.

-- 
Configure bugmail: 
https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

Reply via email to