https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6155

--- Comment #158 from Adam Katz <[email protected]> 2009-11-12 16:20:15 UTC 
---
(In reply to comment #157)
> spamassassin-3.2.5
> score HTML_IMAGE_RATIO_02 1.518 0.550 0.573 0.383
> score HTML_IMAGE_RATIO_04 1.561 0.170 0.863 0.172
> score HTML_IMAGE_RATIO_06 0.401 0.001 0.501 0.001
> score HTML_IMAGE_RATIO_08 0.203 0.001 0.179 0.001
> 
> attachment 4565 [details]
> resulting 50_scores.cf from garescorer runs - V5
> score HTML_IMAGE_RATIO_02 2.199 0.805 1.200 0.437
> score HTML_IMAGE_RATIO_04 2.089 0.610 0.607 0.556
> score HTML_IMAGE_RATIO_06 1.799 0.579 0.901 0.882
> score HTML_IMAGE_RATIO_08 1.410 0.351 0.874 0.021
> 
> The old scores showed a more linear relationship, with a sharp drop-off
> between _04 and _06.  Our masscheck results indicate _02 and _04 hit on
> more spam than ham, but _06 and _08 are pretty worthless.  I think we
> should zero out _06 and _08 while reducing the scores of _02 and _04.

I didn't mention _08 because it wasn't a remarkable enough margin of HAM > SPAM
(my script only reports if HAM% + 0.05 > SPAM%) and my hand-sampling utilized
S/O ratios under .250 while this rule is .320.  Still, it has the problem:

SPAM%   HAM%    S/O    RANK  SCORE NAME                DateRev
0.2709  0.5491  0.330  0.34  0.20  HTML_IMAGE_RATIO_08 20091111-r834803-n
0.2717  0.5492  0.331  0.34  0.20  HTML_IMAGE_RATIO_08 20091110-r834389-n
0.2672  0.5493  0.327  0.34  0.20  HTML_IMAGE_RATIO_08 20091109-r833997-n
0.2075  0.4995  0.294  0.34  0.20  HTML_IMAGE_RATIO_08 20091104-r832683-n
0.2548  0.5476  0.318  0.34  0.20  HTML_IMAGE_RATIO_08 20091028-r830464-n

Here are the results from the 20091111-r834803-n set, pruning only rules
scoring under 0.2 (all hits from my last report are present and asterisked):

 S/O RANK HAM%    SPAM%   Score in attachment 4565 Rule
.014 .15  0.6328  0.0093  0.001 0.001 0.131 0.700  TVD_RCVD_SPACE_BRACKET*
.015 .24  0.1927  0.0029  0.000 2.099 0.001 1.711  MISSING_MIME_HB_SEP*
.019 .22  0.2528  0.0049  1.482 0.855 2.399 2.399  FUZZY_CPILL*
.043 .29  0.1298  0.0059  0.001 1.699 1.498 1.699  X_IP*
.075 .35  0.0603  0.0049  0.000 0.001 0.308 0.001  HTML_NONELEMENT_30_40
.092 .21  0.8123  0.0825  0.699 0.332 0.480 0.800  MIME_BASE64_BLANKS*
.106 .25  0.2483  0.0293  0.551 1.026 1.033 1.250  CTYPE_001C_B*
.123 .33  0.0837  0.0117  0.001 0.648 0.836 1.293  TVD_FW_GRAPHIC_NAME_LONG
.123 .28  0.1632  0.0229  0.001 2.499 0.392 0.164  DRUGS_MUSCLE(*)
.130 .25  0.3663  0.0547  2.385 0.345 0.998 2.503  FRT_SOMA2*
.155 .29  0.1736  0.0317  0.001 0.001 0.001 1.741  MIME_BASE64_TEXT
.188 .27  0.4622  0.1069  0 0.973 0 2.385          SPF_HELO_FAIL*
.214 .31  0.1449  0.0395  2.200 2.199 0.540 2.199  WEIRD_QUOTING*
.239 .30  0.8321  0.2612  1.799 0.579 0.901 0.882  HTML_IMAGE_RATIO_06*
.254 .34  1.3070  0.4442  1.0                      EXTRA_MPART_TYPE*
.330 .34  0.5491  0.2709  1.410 0.351 0.874 0.021  HTML_IMAGE_RATIO_08
.363 .38  1.0856  0.6194  2.600 2.070 1.233 3.405  DATE_IN_PAST_96_XX
.368 .36  0.3029  0.1767  0.001 0.791 0.001 0.008  UPPERCASE_50_75
.381 .37  0.6473  0.3983  0.354 0.001 0.725 0.428  MIME_HTML_MOSTLY
.660 .51  1.8514  3.5893  0.518 1.625 1.197 1.506  SUBJ_ALL_CAPS
.905 .58  1.0822 10.2987  0 1.246 0 1.347          RCVD_IN_BL_SPAMCOP_NET
.934 .56  3.6172 51.2001  2.199 1.105 1.199 0.723  MIME_HTML_ONLY
.957 .52  2.2200 50.3063  2.399 1.274 1.228 0.793  RDNS_NONE

DRUGS_MUSCLE met all the requirements I set for my last report, but I removed
it because it had almost no hits anyway, and it scored very very low except on
net+no-bayes, so I was assuming it had some justification there somehow.

-- 
Configure bugmail: 
https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

Reply via email to