https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6155
--- Comment #158 from Adam Katz <[email protected]> 2009-11-12 16:20:15 UTC --- (In reply to comment #157) > spamassassin-3.2.5 > score HTML_IMAGE_RATIO_02 1.518 0.550 0.573 0.383 > score HTML_IMAGE_RATIO_04 1.561 0.170 0.863 0.172 > score HTML_IMAGE_RATIO_06 0.401 0.001 0.501 0.001 > score HTML_IMAGE_RATIO_08 0.203 0.001 0.179 0.001 > > attachment 4565 [details] > resulting 50_scores.cf from garescorer runs - V5 > score HTML_IMAGE_RATIO_02 2.199 0.805 1.200 0.437 > score HTML_IMAGE_RATIO_04 2.089 0.610 0.607 0.556 > score HTML_IMAGE_RATIO_06 1.799 0.579 0.901 0.882 > score HTML_IMAGE_RATIO_08 1.410 0.351 0.874 0.021 > > The old scores showed a more linear relationship, with a sharp drop-off > between _04 and _06. Our masscheck results indicate _02 and _04 hit on > more spam than ham, but _06 and _08 are pretty worthless. I think we > should zero out _06 and _08 while reducing the scores of _02 and _04. I didn't mention _08 because it wasn't a remarkable enough margin of HAM > SPAM (my script only reports if HAM% + 0.05 > SPAM%) and my hand-sampling utilized S/O ratios under .250 while this rule is .320. Still, it has the problem: SPAM% HAM% S/O RANK SCORE NAME DateRev 0.2709 0.5491 0.330 0.34 0.20 HTML_IMAGE_RATIO_08 20091111-r834803-n 0.2717 0.5492 0.331 0.34 0.20 HTML_IMAGE_RATIO_08 20091110-r834389-n 0.2672 0.5493 0.327 0.34 0.20 HTML_IMAGE_RATIO_08 20091109-r833997-n 0.2075 0.4995 0.294 0.34 0.20 HTML_IMAGE_RATIO_08 20091104-r832683-n 0.2548 0.5476 0.318 0.34 0.20 HTML_IMAGE_RATIO_08 20091028-r830464-n Here are the results from the 20091111-r834803-n set, pruning only rules scoring under 0.2 (all hits from my last report are present and asterisked): S/O RANK HAM% SPAM% Score in attachment 4565 Rule .014 .15 0.6328 0.0093 0.001 0.001 0.131 0.700 TVD_RCVD_SPACE_BRACKET* .015 .24 0.1927 0.0029 0.000 2.099 0.001 1.711 MISSING_MIME_HB_SEP* .019 .22 0.2528 0.0049 1.482 0.855 2.399 2.399 FUZZY_CPILL* .043 .29 0.1298 0.0059 0.001 1.699 1.498 1.699 X_IP* .075 .35 0.0603 0.0049 0.000 0.001 0.308 0.001 HTML_NONELEMENT_30_40 .092 .21 0.8123 0.0825 0.699 0.332 0.480 0.800 MIME_BASE64_BLANKS* .106 .25 0.2483 0.0293 0.551 1.026 1.033 1.250 CTYPE_001C_B* .123 .33 0.0837 0.0117 0.001 0.648 0.836 1.293 TVD_FW_GRAPHIC_NAME_LONG .123 .28 0.1632 0.0229 0.001 2.499 0.392 0.164 DRUGS_MUSCLE(*) .130 .25 0.3663 0.0547 2.385 0.345 0.998 2.503 FRT_SOMA2* .155 .29 0.1736 0.0317 0.001 0.001 0.001 1.741 MIME_BASE64_TEXT .188 .27 0.4622 0.1069 0 0.973 0 2.385 SPF_HELO_FAIL* .214 .31 0.1449 0.0395 2.200 2.199 0.540 2.199 WEIRD_QUOTING* .239 .30 0.8321 0.2612 1.799 0.579 0.901 0.882 HTML_IMAGE_RATIO_06* .254 .34 1.3070 0.4442 1.0 EXTRA_MPART_TYPE* .330 .34 0.5491 0.2709 1.410 0.351 0.874 0.021 HTML_IMAGE_RATIO_08 .363 .38 1.0856 0.6194 2.600 2.070 1.233 3.405 DATE_IN_PAST_96_XX .368 .36 0.3029 0.1767 0.001 0.791 0.001 0.008 UPPERCASE_50_75 .381 .37 0.6473 0.3983 0.354 0.001 0.725 0.428 MIME_HTML_MOSTLY .660 .51 1.8514 3.5893 0.518 1.625 1.197 1.506 SUBJ_ALL_CAPS .905 .58 1.0822 10.2987 0 1.246 0 1.347 RCVD_IN_BL_SPAMCOP_NET .934 .56 3.6172 51.2001 2.199 1.105 1.199 0.723 MIME_HTML_ONLY .957 .52 2.2200 50.3063 2.399 1.274 1.228 0.793 RDNS_NONE DRUGS_MUSCLE met all the requirements I set for my last report, but I removed it because it had almost no hits anyway, and it scored very very low except on net+no-bayes, so I was assuming it had some justification there somehow. -- Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug.
