https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6389
--- Comment #15 from Adam Katz <[email protected]> 2010-04-12 18:50:51 EDT --- Just a follow-up because I had some investigations running when this was closed... Rules ------------------ # From rulesrc/sandbox/khopesh/20_bug_6389.cf on trunk at r932438 # http://svn.apache.org/viewvc/spamassassin/trunk/rulesrc/sandbox/khopesh/20_bug_6389.cf?revision=932438&view=markup # just a raw numbers check: header __HAS_XMIME_AUTOCONV exists:X-MIME-Autoconverted tflags __HAS_XMIME_AUTOCONV nice # possible fix to bug 6389 header __MIME_QP_TO_8BIT X-MIME-Autoconverted =~ /from quoted-printable to 8bit/ tflags __MIME_QP_TO_8BIT nice # John Wilcock's proposed subtitutions for __..._ENCODED_B64 (comment 8) header __FROM_1BYTE_B64 From:raw =~ /=\?(?:iso-8859-1?\d|windows-125\d|koi-8r?)\?B\?/i header __SUBJ_1BYTE_B64 Subject:raw =~ /=\?(?:iso-8859-1?\d|windows-125\d|koi-8r?)\?B\?/i meta DOS_HIGHBIT_HDRS_BODY_BUG6389 __FROM_NEEDS_MIME && __SUBJECT_ENCODED_B64 && __FROM_ENCODED_B64 && __SUBJECT_NEEDS_MIME && __HIGHBITS && !__MIME_QP_TO_8BIT # Daryl O'Shea (DOS) + Adam Katz (KHOP) + John Wilcock version meta FROM_SUBJ_BODY_8BIT __FROM_NEEDS_MIME && __SUBJ_1BYTE_B64 && __FROM_1BYTE_B64 && __SUBJECT_NEEDS_MIME && __HIGHBITS && !__MIME_QP_TO_8BIT # assuming recipients won't also be highbit'd ("highbitten?") header __TO_1BYTE_B64 To:raw =~ /=\?(?:iso-8859-1?\d|windows-125\d|koi-8r?)\?B\?/i meta FROM_SUBJ_NOTO_BODY_8BIT __FROM_NEEDS_MIME && __SUBJ_1BYTE_B64 && __FROM_1BYTE_B64 && __SUBJECT_NEEDS_MIME && __HIGHBITS && !__MIME_QP_TO_8BIT && !__TO_1BYTE_B64 Results from 2010-04-11 (non-net run) ------------------ http://ruleqa.spamassassin.org/20100411-r932853-n/%2FDOS_HIGHB|MIME_QP_TO_|HAS_XMIME_|_1BYTE_B64|_ENCODED_B64|FROM_SUBJ_ SPAM% HAM% S/O RANK SCORE NAME 1.1775 0.0359 0.970 0.82 0.01 T_DOS_HIGHBIT_HDRS_BODY_BUG6389 0.0718 0.0021 0.972 0.66 0.01 T_FROM_SUBJ_BODY_8BIT 0.0714 0.0021 0.972 0.66 0.01 T_FROM_SUBJ_NOTO_BODY_8BIT 0.5069 0.2155 0.702 0.62 (n/a) __SUBJ_1BYTE_B64 0.0928 0.1333 0.410 0.53 (n/a) __FROM_1BYTE_B64 2.3337 2.3339 0.500 0.51 (n/a) __SUBJECT_ENCODED_B64 1.3552 1.7032 0.443 0.50 (n/a) __FROM_ENCODED_B64 0.0004 0.1519 0.003 0.31 (n/a) __TO_1BYTE_B64 6.2081 1.0613 0.854 0.24 (n/a) __HAS_XMIME_AUTOCONV 6.1458 0.9837 0.862 0.24 (n/a) __MIME_QP_TO_8BIT That rules out the suggestions from comment 8. Because Daryl removed the original rule, it's not listed here, but my modification did little to nothing. A breakdown of T_DOS_HIGHBIT_HDRS_BODY_BUG6389 scores: scoremap ham: 0 79.31% 69 ******************************* scoremap ham: 1 3.45% 3 * scoremap ham: 2 16.09% 14 ****** scoremap ham: 3 1.15% 1 scoremap spam: 0 2.85% 413 * scoremap spam: 1 0.15% 22 scoremap spam: 2 18.89% 2734 ******* scoremap spam: 3 3.70% 536 * scoremap spam: 4 4.40% 637 * scoremap spam: 5 12.40% 1794 **** scoremap spam: 6 5.51% 797 ** scoremap spam: 7 7.81% 1130 *** scoremap spam: 8 10.22% 1479 **** scoremap spam: 9 5.66% 819 ** scoremap spam: 10 7.17% 1037 ** scoremap spam: 11 5.80% 839 ** scoremap spam: 12 4.35% 629 * scoremap spam: 13 2.74% 396 * scoremap spam: 14 2.64% 382 * scoremap spam: 15 1.53% 221 scoremap spam: 16 1.29% 187 scoremap spam: 17 0.98% 142 scoremap spam: 18 0.53% 76 scoremap spam: 19 0.53% 76 scoremap spam: 20 0.27% 39 scoremap spam: 21 0.20% 29 scoremap spam: 22 0.12% 17 scoremap spam: 23 0.08% 12 scoremap spam: 24 0.10% 15 scoremap spam: 25 0.01% 2 scoremap spam: 26 0.01% 2 scoremap spam: 28 0.02% 3 scoremap spam: 29 0.01% 2 scoremap spam: 30 0.01% 1 scoremap spam: 32 0.02% 3 scoremap spam: 33 0.01% 1 Overlap Spam (50% and up) x% of this rule x also hit this rule y, y% of y also hit x 76% T_DOS_HIGHBIT_HDRS...6389 T_FSL_HELO_NON_FQDN_2 1% 72% T_DOS_HIGHBIT_HDRS...6389 RCVD_IN_PBL 1% 68% T_DOS_HIGHBIT_HDRS...6389 RCVD_IN_XBL 1% 55% T_DOS_HIGHBIT_HDRS...6389 RAZOR2_CHECK 0% 53% T_DOS_HIGHBIT_HDRS...6389 RAZOR2_CF_RANGE_51_100 0% 53% T_DOS_HIGHBIT_HDRS...6389 RDNS_NONE 1% 51% T_DOS_HIGHBIT_HDRS...6389 RCVD_IN_BL_SPAMCOP_NET 1% Note that despite this being a non-net run, the overlap still has RDNS_NONE as the only matching (published) non-net rule that overlapped over 50%. In a scan completely lacking network tests, the score-map would be even lower and the rule would appear more valuable. Results from 2010-04-10 (net run) ------------------ http://ruleqa.spamassassin.org/20100410-r932679-n/%2FDOS_HIGHB%7CMIME_QP_TO_%7CHAS_XMIME_%7C_1BYTE_B64%7C_ENCODED_B64%7CFROM_SUBJ_ SPAM% HAM% S/O RANK SCORE NAME 1.1755 0.0116 0.990 0.86 0.01 T_DOS_HIGHBIT_HDRS_BODY_BUG6389 0.5164 0.0390 0.930 0.76 (n/a) __SUBJ_1BYTE_B64 0.0685 0 1.000 0.66 0.01 T_FROM_SUBJ_BODY_8BIT 0.0682 0 1.000 0.66 0.01 T_FROM_SUBJ_NOTO_BODY_8BIT 0.0854 0.0435 0.663 0.61 (n/a) __FROM_1BYTE_B64 2.3165 2.0477 0.531 0.52 (n/a) __SUBJECT_ENCODED_B64 1.3498 1.6534 0.449 0.51 (n/a) __FROM_ENCODED_B64 0.0004 0.0099 0.039 0.47 (n/a) __TO_1BYTE_B64 6.2616 1.1081 0.850 0.23 (n/a) __HAS_XMIME_AUTOCONV 6.1999 1.0350 0.857 0.23 (n/a) __MIME_QP_TO_8BIT A breakdown of T_DOS_HIGHBIT_HDRS_BODY_BUG6389 scores: scoremap ham: -2 65.38% 17 ************************** scoremap ham: 0 26.92% 7 ********** scoremap ham: 1 3.85% 1 * scoremap ham: 4 3.85% 1 * scoremap spam: 0 0.05% 7 scoremap spam: 1 0.20% 29 scoremap spam: 2 0.78% 113 scoremap spam: 3 0.55% 80 scoremap spam: 4 1.11% 161 scoremap spam: 5 1.56% 226 scoremap spam: 6 2.57% 373 * scoremap spam: 7 3.76% 546 * scoremap spam: 8 4.86% 705 * scoremap spam: 9 6.58% 955 ** scoremap spam: 10 7.68% 1114 *** scoremap spam: 11 8.85% 1284 *** scoremap spam: 12 8.48% 1230 *** scoremap spam: 13 8.19% 1188 *** scoremap spam: 14 8.07% 1171 *** scoremap spam: 15 6.81% 989 ** scoremap spam: 16 6.02% 873 ** scoremap spam: 17 5.29% 767 ** scoremap spam: 18 4.36% 632 * scoremap spam: 19 3.41% 495 * scoremap spam: 20 2.56% 371 * scoremap spam: 21 2.06% 299 scoremap spam: 22 1.45% 211 scoremap spam: 23 1.13% 164 scoremap spam: 24 0.87% 126 scoremap spam: 25 0.74% 108 scoremap spam: 26 0.59% 85 scoremap spam: 27 0.28% 40 scoremap spam: 28 0.19% 27 scoremap spam: 29 0.10% 14 scoremap spam: 30 0.24% 35 scoremap spam: 31 0.10% 14 scoremap spam: 32 0.11% 16 scoremap spam: 33 0.10% 14 scoremap spam: 34 0.05% 7 scoremap spam: 35 0.08% 11 scoremap spam: 36 0.08% 12 scoremap spam: 37 0.03% 4 scoremap spam: 38 0.03% 4 scoremap spam: 39 0.01% 2 scoremap spam: 40 0.03% 4 scoremap spam: 41 0.01% 2 scoremap spam: 42 0.01% 2 scoremap spam: 43 0.01% 1 scoremap spam: 47 0.01% 1 Overlap Spam (50% and up) x% of this rule x also hit this rule y, y% of y also hit x 95% T_DOS_HIGHBIT_HDRS...6389 RCVD_IN_BRBL_LASTEXT 1% 76% T_DOS_HIGHBIT_HDRS...6389 T_FSL_HELO_NON_FQDN_2 1% 73% T_DOS_HIGHBIT_HDRS...6389 RCVD_IN_PBL 1% 68% T_DOS_HIGHBIT_HDRS...6389 RCVD_IN_XBL 1% 61% T_DOS_HIGHBIT_HDRS...6389 T_RCVD_IN_ANBREP_BL 1% 56% T_DOS_HIGHBIT_HDRS...6389 RAZOR2_CHECK 0% 54% T_DOS_HIGHBIT_HDRS...6389 RAZOR2_CF_RANGE_51_100 0% 53% T_DOS_HIGHBIT_HDRS...6389 RDNS_NONE 1% 51% T_DOS_HIGHBIT_HDRS...6389 RCVD_IN_BL_SPAMCOP_NET 1% 50% T_DOS_HIGHBIT_HDRS...6389 RAZOR2_CF_RANGE_E4_51_100 3% Conclusion ------------------ This rule is not worthwhile in network-enabled checks. Without network tests, this rule may be extremely valuable. Assuming we're interested in developing offline-only tests, this is worth revisiting once we have more corpora from areas that use non-Latin character sets (specifically China), especially if we can pin it to not fire on network tests. I have removed the tests from SVN (satisfying comment #14). They will disappear from the ruleqa system in the next day or two. $ svn delete --force 20_bug_6389.cf D 20_bug_6389.cf $ svn commit -m "Bug closed. I posted my observations, including this file's contents and stats for ent and non-net runs, on bug 6389, comment 14" 20_bug_6389.cf Deleting 20_bug_6389.cf Committed revision 933340. $ -- Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug.
